REVIEW article

The role of eye gaze in regulating turn taking in conversations: a systematized review of methods and findings.

\nZiedune Degutyte

  • Samsung AI Center, Cambridge, United Kingdom

Eye gaze plays an important role in communication but understanding of its actual function or functions and the methods used to elucidate this have varied considerably. This systematized review was undertaken to summarize both the proposed functions of eye gaze in conversations of healthy adults and the methodological approaches employed. The eligibility criteria were restricted to a healthy adult population and excluded studies that manipulated eye gaze behavior. A total of 29 articles—quantitative, qualitative and mixed methods were returned, with a wide range of methodological designs. The main areas of variability related to number of conversants, their familiarity and status, conversation topic, data collection tools—video and eye tracking—and definitions of eye gaze. The findings confirm that eye gaze facilitates turn yielding, plays a role in speech monitoring, prevents and repairs conversation breakdowns and facilitates intentional and unintentional speech interruptions. These findings were remarkably consistent given the variability in methods across the 29 articles. However, in relation to turn initiation, the results were less consistent, requiring further investigation. This review provides a starting point for future studies to make informed decisions about study methods for examining eye gaze and selecting variables of interest.

Introduction

Human beings have evolved complex social-cognitive skills which enable us to exchange knowledge and communicate in multiple ways ( Herrmann et al., 2007 ). People exchange verbal, vocal [e.g., tone of voice; ( Lerner, 2004 )] and non-verbal [e.g., eye gaze, gestures, facial expressions ( Kendon, 1967 ; Bavelas and Chovil, 2000 )] behaviors that convey meanings, intentions, and information. Non-verbal behavior can enrich conversation by adding extra information, or revealing emotional states that are not expressed verbally ( Choi et al., 2005 ). Eye gaze in particular has been identified as playing a key role in communication, with infants showing a preference for direct gaze from birth ( Farroni et al., 2002 ). The role that eye gaze plays in social interaction has been studied across a variety of fields, including typical and atypical child development ( Baron-Cohen, 1997 ; Morales et al., 2000 ), mental health conditions [including schizophrenia ( Dowiasch et al., 2016 ); posttraumatic stress disorder ( Lazarov et al., 2019 ), and bipolar disorder ( Purcell et al., 2018 )], primates ( Ryan et al., 2019 ) and human-robot interaction ( Admoni and Scassellati, 2017 ). Additionally, eye gaze has been studied with different theoretical and methodological approaches from neuroscience ( Sato et al., 2016 ) to sociology ( McCarthy et al., 2008 ), producing a rich variety of data but complicating the conclusions that can be drawn about the role of eye-gaze in conversation.

Pioneering research conducted by Kendon (1967) suggested that eye gaze is used to regulate and monitor turn taking. Specifically, Kendon proposed that speakers tend to avert their gaze at the start of their turn in order to concentrate and plan their speech or to indicate that they are now holding the floor. He further proposed that in a two-person conversation, at the end of their turn the speaker gazes at the listener to indicate the end of their turn and to seek information about the listener's availability to speak next ( Kendon, 1967 ).

A decade later Kendon's research was challenged in studies by Rutter et al. (1978) and Beattie (1978 , 1979) . Whilst Rutter et al. (1978) also found that at the end of the turns, speakers tended to gaze at the listener in a dyadic situation, they argued that in order to claim that eye gaze has a role in turn taking, the gaze pattern should follow three rules. Firstly, speakers should be looking at their conversation partners more at the end of their turns than at the beginning, because at the start of the turn the speakers should be gazing away to concentrate and plan their speech ( Rutter et al., 1978 ). Secondly, at the end of one speaker's turn, the conversation partners should share a high level of mutual gaze, because in order for a speaker to pass the turn the listener should be available to receive it ( Rutter et al., 1978 ). Finally, there should be higher levels of mutual gaze between conversation partners at the end of the turns rather than at the start of the new turns, because a new speaker at the start of their turn, should start gazing away to concentrate ( Rutter et al., 1978 ). To test these predictions, Rutter et al. (1978) carried out two studies of which the first failed to support these three rules, and the second provided only partial support.

However, Rutter et al.'s (1978) approach to data analysis differed from Kendon's (1967) making direct comparison difficult. For example, to test the first rule—that the speakers should be looking at their conversation partners more at the end of their turns than at the beginning (because at the start of the turn the speakers should be gazing away to concentrate and plan their speech)— Rutter et al. (1978) compared the number of turns when the speaker was looking at the listener at the start of new utterance with the number of turns that the speaker was looking at the listener at the end of old utterance. In comparison, Kendon (1967) in attempting to identify if the speaker was looking at their conversation partners more at the end of their turns, compared the number of turns in which the speaker gazed toward the listener with the number of turns in which the speaker did not gaze toward the listener at the end of the turns.

Kendon (1978) findings were further challenged by Beattie (1978 , 1979) in respect of proposed methodological limitations. Beattie (1978) pointed out that Kendon (1967) failed to provide the definition of “utterance” he used or the different types of gaze (i.e., prolonged, sustained). Furthermore, Beattie (1978) noted that in one of Kendon's (1967) analyses in which he was examining utterances with delayed responses, Kendon used data from only two out of the seven dyads and also reported data about “long utterances,” which actually included data of “all utterances.” Beattie's (1978) overall findings did not support Kendon's (1967) claim that eye gaze facilitates turn taking. In fact, Beattie (1978) found an opposite effect that more turns ended with gaze aversion.

Kendon (1978) responded by highlighting multiple differences in methodologies in the studies of Beattie (1978) and Rutter et al.'s (1978) that may have contributed to different findings between the studies. For example, both topic and type of conversation, differed between Rutter's first experiment and Kendon's study. Rutter et al. (1978) used 3-min segments from the beginning, middle and the end of a “getting acquainted” conversation. Kendon also used segments from a “getting acquainted” conversation but mainly concentrated on the segments toward the end of a 30 min conversation ( Kendon, 1978 ). Kendon (1978) argued that Rutter et al.'s (1978) choice to include the first 3 min of conversation when people spent time exchanging details about themselves, may have affected their eye gaze behavior. For example, Exline et al. (1965) observed that when participants are asked very personal questions about their fears and desires, they are more likely to avoid mutual gaze than during non-personal ones. Kendon (1978) also noted differences in the status of the speakers between his study and Beattie's (1978) , where the speakers were of unequal status, specifically a student and a supervisor. More recent investigation suggests that social status of one conversation partner affects eye gaze behavior of the other conversation partner, such that people with a high status tend to be observed more often and for a longer periods of time, than people with a lower status ( Foulsham et al., 2010 ).

These three early studies ( Kendon, 1967 ; Beattie, 1978 ; Rutter et al., 1978 ) highlighted the importance of study variables when designing studies of the role of eye gaze in conversation. Many further studies have also identified various factors that affect eye gaze direction during conversation. For example, the amount of gaze and direction tend to be affected by acquaintance status ( Strongman and Champness, 1968 ; Rubin, 1970 ; Bissonnette, 1993 ), spatial arrangements between conversation partners ( Argyle and Dean, 1965 ; Argyle et al., 1973 ; Blythe et al., 2018 ), gender ( Argyle and Dean, 1965 ; Myszka, 1975 ; Bissonnette, 1993 ), cultural and ethnic factors ( McCarthy et al., 2006 ; Rossano et al., 2009 ), conversation topic ( Exline et al., 1965 ; Glenberg et al., 1998 ), and when experiencing different emotions ( Kendon, 1967 ; Adams et al., 2005 ; Kleinke, 1986 ).

Understanding the role of eye gaze in turn-taking requires an understanding of how turns work in conversations. From a linguistic perspective turn-taking consists of many components and rules (for a full review see: Duncan, 1972 ; Sacks et al., 1974 ). Duncan (1972) , proposed that turn taking is communicated through a set of rules and behavioral signals that both speakers and listeners follow. For example, the next speaker can take their speaking turn if the current speaker shows one or multiple “turn-yielding” signals. The turn-yielding signals include rising or falling intonation at the end of phonemic clause, a stressed syllable at the end of phonemic clause, turning the head toward the listener or/and stopping using hand gestures. However, if the current speaker wishes to continue and hold their turn, despite displaying some turn-yielding signals, the attempt-suppressing signal (i.e., turn holding) that consists of the speaker using hand gestures, almost always wins over ( Duncan, 1972 ).

Conversation Analysis (CA; Sacks et al., 1974 ) was pioneered to study social interactions taking human actions and social context into consideration (for full review see: Goodwin and Heritage, 1990 ). Sacks et al. (1974) , considered that turn taking is influenced by two components. The first, “Turn Constructional Unit” (TCU), defines a turn as a construct made of either sentential, clausal, lexical, or phrasal units. The speaker is permitted to finish one of these unit turns, and the first possible completion of this unit represents a “Transition Relevance Place” (TRP), where the next speaker may take over a speaking turn ( Sacks et al., 1974 ). The second component is the “Turn Allocation” component, a technique used to allocate the next speaker. Sacks et al. (1974) proposed that a turn can be allocated either by the current speaker selecting the next speaker by using some sort of reference such as direct eye gaze or by listeners self-selecting themselves to be the next speaker. Furthermore, these two components are accompanied by rules similar to those described by Duncan (1972) , such as if a current speaker selects the next speaker at the TRP, then the observing participants, i.e., those not selected by the current speaker to take the next turn, should not proceed. During self-selection the first speaker to speak would be granted a turn ( Sacks et al., 1974 ).

Additionally, Schegloff and Sacks (1973) , Sacks et al. (1974) and Schegloff (1972) proposed that types of action sequences play a role in next-speaker selection. These sequences consist of two parts that are relevant to each other, where the first part produced by one speaker, selects the next speaker to contribute to the second part ( Schegloff and Sacks, 1973 ). These may include, question-answer sequence ( Schegloff, 1972 ), greeting-greeting sequence ( Schegloff, 1968 ), other-initiated repair sequences ( Schegloff, 1997 ), or sequence-initiating actions ( Robinson and Bolden, 2010 ). For example, during sequence-initiating actions, one speaker may offer a favor to another person, who is then obliged to refuse or accept. In this case the two parts would be offer-refusal/acceptance ( Robinson and Bolden, 2010 ). Alternatively, during greeting-greeting sequences, one person may greet another person, which would oblige the other person to greet them back ( Schegloff, 1968 ). Furthemore, work by Rossano (2012) revealed that participants' eye gaze behavior was most likely to be influenced by this sequential organization of the turns as proposed by Schegloff and Sacks (1973) , Sacks et al. (1974) , Schegloff (1972) , and Schegloff (1997) and may operate in different ways when listening to simple questions than when listening to extended stories ( Rossano, 2012 ). Referring back to the studies by Kendon (1967) , Beattie (1978) , and Rutter et al. (1978) , they reported different definitions of turns and analyzed conversations consisting of variety of sentence types including greetings, questions and possible extended stories, which likely have influenced the findings.

Since the early days of eye gaze research many further studies have been undertaken using a wide range of methodological approaches to clarify and extend our understanding of eye gaze in conversation. Reflecting the importance of both methods and results in eye gaze research, the aim of this review is to (i) summarize findings of the role of eye gaze in relation to turn taking and (ii) to summarize the major methodological considerations in this field of research. The researchers hope that this review will benefit researchers new to this field, seeking to learn more about this subject or conduct their own research.

A systematized review method was chosen because it aims to include elements of systematic review, such as comprehensive search, but has fewer restrictions for inclusion criteria ( Grant and Booth, 2009 ). This suits the requirements of the current search to return the broad range of articles and methodologies employed in this field. A seven-step framework for systematic review was used for guidance, with four (research question, literature search, data extraction, results) out of the seven steps used in this review ( Wright et al., 2007 ). The first step based on the framework of Wright et al. (2007) was to formulate a research question:

• Does the healthy adult population use eye gaze to regulate turn taking in conversations?

The second step was to conduct a literature search. A total of 20 search terms were created and combined to reflect the population of interest, exposure, and outcome ( Table 1 ). To capture a broader range of literature appropriate Boolean search terms were used ( Table 1 ). The literature search was carried out on two relevant databases in the psychology field: PsychINFO and Web of Science. As Web of Science consists of categories unrelated to psychology, to simplify the search, categories such as ophthalmology, neuroscience, engineering electrical electronic and zoology were excluded. The included Web of Science search categories were psychology experimental, psychology, psychology social, psychology biological, and psychology educational.

www.frontiersin.org

Table 1 . A list of search terms.

No restriction was placed on the date of publication to cover the evolution of research into the role of eye gaze in turn taking. The included papers spanned qualitative, quantitative, and mixed methods. All included studies observed human to human conversations to measure gaze direction within speech turns. Studies that manipulated participants' gaze behavior (i.e., instructed to stare at the partner) and studies on animals, children, robots, and all mental health or cognitive disorders where excluded. The search was conducted in the last week of May 2019. The data extraction was performed on 3,899 retrieved papers. A total of 421 duplicates and 288 unpublished dissertation articles were removed. The remaining papers were journal articles, book chapters and conference journal papers. The review only included original research papers, therefore book chapters summarizing findings of other studies were excluded. After reviewing titles and abstracts, 3,147 papers were excluded due to not meeting criteria. A total of 43 papers were further investigated and after applying exclusion criteria, 20 papers were selected for final analysis. A further hand search was carried out by scanning the publication titles in the reference of the selected 20 papers and then reading abstracts of the selected titles. This process identified an extra nine papers for inclusion ( Figure 1 ).

www.frontiersin.org

Figure 1 . Flow diagram of search procedure.

The third step was to conduct data extraction. Each of the 29 selected papers was read in full and information regarding research methods, such as language the study was conducted in, participant demographics and study procedures was recorded in a spreadsheet. Information was also recorded about the data collection method, definitions of eye gaze used, and coding schemes applied to the data. The researchers noted all the information about eye gaze patterns in relation to different features of turn taking ( Table 2 ) which were then formed into groups. The final step based on the systematic review framework ( Wright et al., 2007 ) was reporting the results.

www.frontiersin.org

Table 2 . Key words and definitions.

Study Characteristics

There was great variation between studies in the level of detail provided with multiple papers omitting important details about design and participants or the rationale for their methodological decisions. The studies varied in size from five to 69 participants ( Table 3 ), but only 11 studies reported participant's ages ( Lamb, 1981 ; Harrigan and Steffen, 1983 ; Harrigan, 1985 ; Egbert, 1996 ; Rossano et al., 2009 ; Eberhard and Nicholson, 2010 ; Cummins, 2012 ; Jokinen et al., 2013 ; Ho et al., 2015 ; Holler and Kendrick, 2015 ; Brône et al., 2017 ), which ranged between 18 and 65 years. Fourteen of the 29 studies examined conversation in dyads, 11 looked at triads, six studied multiparty conversations, and one did not report the number of interactants ( Table 3 ). In 17 of the studies the participants were acquainted with each other, in eight the participants were unacquainted, and eight did not specify the relationship between participants ( Table 3 ). Seven studies reported only same sex conversations, seven reported only mixed sex conversations, four studies reported both same and mixed sex conversations, and 12 studies did not specify ( Table 3 ).

www.frontiersin.org

Table 3 . Description of interactions.

Seventeen of the 29 studies were quantitative, eight were qualitative, and four used a mixed methods design ( Table 4 ). Fourteen studies using quantitative design reported reliability scores (ranging between 0.46 kappa and 100%) or provided information on how they assessed agreement between multiple coders to correct disagreements ( Table 4 ). The remaining three quantitative studies did not report reliability results ( Table 4 ). Two out of four mixed method studies reported how they looked for agreement between multiple coders to correct disagreements ( Table 4 ). Reliability checks were not reported in the qualitative studies ( Table 4 ), as is a common practice ( McDonald et al., 2019 ).

www.frontiersin.org

Table 4 . Characteristics of study designs and manipulations.

Only 18 of the 29 studies reported the language in which the conversations took place: two of the studies observed conversations in Dutch, three in Japanese, three in German, six in English, one in both Dutch and German, one in English and Lebanese Arabic, one in Italian, Yeli Dnye (Papua New Guinea region) and Tzeltal (Mexico region), and one study in four Australian Aboriginal languages. Of the eleven studies that that did not specify the language of the conversation, two were conducted in universities in England, one in Canada and one in US. The remaining seven studies did not specify language or location of the study ( Table 4 ).

The studies varied in conversation activity: in 20 studies the participants were instructed to converse freely, in ten they were asked to discuss a specific topic, in two the participants completed tasks: a memory recall task ( Novick et al., 1996 ) and a game ( Ho et al., 2015 ) and one study ( Holler and Kendrick, 2015 ) did not specify the instructions provided to participants ( Table 4 ). The length of coded conversations ranged from 2 min to an hour.

All 29 studies used video recording to capture eye gaze during conversation, however nine did not specify how many cameras were used ( Beattie, 1978 , 1979 ; Rutter et al., 1978 ; Goodwin, 1980 ; Harrigan, 1985 ; Egbert, 1996 ; Lerner, 2003 ; Park, 2015 ; Blythe et al., 2018 ). Seven studies used one camera for each participant ( Lamb, 1981 ; Bavelas et al., 2002 ; Eberhard and Nicholson, 2010 ; Cummins, 2012 ; Ho et al., 2015 ; Holler and Kendrick, 2015 ; Ijuin et al., 2018 ), three studies used one camera for the whole group interaction ( Kendon, 1967 ; Harrigan and Steffen, 1983 ; Streeck, 2014 ), seven studies video recorded both each participant plus the whole group interaction ( Kalma, 1992 ; Novick et al., 1996 ; Brône et al., 2017 ; Kendrick and Holler, 2017 ; Auer, 2018 ; Weiss, 2018 ; Zima et al., 2019 ), two studies only video recorded two out of three participants and eye tracked the third participant ( Jokinen et al., 2009 , 2013 ), one study used two cameras to capture interactions in Italian language and only one camera to capture interactions in Tzeltal and Yeli Dnye languages Rossano et al. (2009) . Eleven studies used camera-based eye tracking technology ( Table 4 ), which permits investigators to measure participant's visual behavior by detecting and tracking movement of different parts of the eye (see review: Morimoto and Mimica, 2005 ). Of these, two studies used a single table eye tracker to track one out of three participants ( Jokinen et al., 2009 , 2013 ) and one study tracked the eyes of two out of three participants in the conversation due to technical issues ( Auer, 2018 ).

One study ( Ijuin et al., 2018 ) used gaze ratio to measure the role of eye gaze in conversation, with the other 28 studies using gaze direction ( Table 5 ). Even so, the studies largely failed to define the key variable “gaze” or defined it very vaguely ( Table 5 ). Only five studies included a time scale in defining gaze fixation, with a starting point of gaze fixation ranging between 0.12 to 1 s ( Table 5 ). There was also large variation in the segments of conversation analyzed including long utterances that last more than 5 s, speech interruptions, question-response sequences and backchannels ( Table 5 ). Studies used a variety of methods to transcribe their verbal and non-verbal data; including pictographic symbols, four channel push button system, ethogram method, Conversation Analysis method (CA), GAT and GAT2 (in German language Gesprächsanalytisches Transkriptionsystem) ( Selting et al., 1998 ) method ( Table 5 ). Eleven studies used computer software, including Anvil, ELAN, and Adobe, to annotate their verbal and non-verbal data ( Table 5 ). These varied methodological scenarios were examined to look for patterns in eye gaze during conversation.

www.frontiersin.org

Table 5 . Definitions of key variables and coding method.

Based on the detailed examination of the articles, eye gaze patterns in conversations were grouped into six themes: (a) starting a turn, (b) eye gaze behavior during speech, (c) simultaneous speech, (d) turn yielding, (e) unaddressed participant's view, (f) unwillingness to take a turn ( Table 6 ). Each of these is described below.

www.frontiersin.org

Table 6 . A list of study outcomes.

Eye Gaze Starting a Turn

The proposal by Kendon (1967) , that speakers tend to avert their gaze at the start of the turn, was confirmed in both dyadic ( Cummins, 2012 ; Ho et al., 2015 ) and triadic ( Jokinen et al., 2009 ) conversations and with acquainted ( Jokinen et al., 2009 ; Cummins, 2012 ) and unacquainted ( Ho et al., 2015 ) participants. For example, Jokinen et al. (2009) found that in 69% of cases, participants started their turn with their gaze averted from the conversation partners. Ho et al. (2015) compared eye gaze behavior relating to turn taking, during two games consisting of different rules and playing styles. The researchers found that averted eye gaze pattern at the start of the turn remained relatively consistent across both games.

When examining eye gaze behavior in relation to turn taking, Novick et al. (1996) identified two patterns of gaze behavior as two conversation partners recalled a string of letters. In this study, each group of participants were given the task of memorizing and reconstructing 17 letter sequences by taking turns in conversation. Both of the participants were given a sequence of letters. However, the sequences contained some blank spaces and only their conversation partners were able to fill those in by recalling the memorized letters. The study reported that 42% of turns had a mutual break pattern where speakers ended an utterance with a gaze toward the next speaker, followed by a brief mutual gaze, and then the next speaker started with an averted gaze. Mutual hold , which occurred in 29% of turns, differed by the next speaker holding their partner's gaze when starting their turn. The mutual break pattern was used more often in conversations that required fewer turn-taking attempts in order to complete the task, which associated with more successful memory recalls. The mutual hold was used most often in conversations requiring more attempts at turn taking, which associated with participant uncertainty about their recall and with more self or partner corrections ( Novick et al., 1996 ). Consequently, the finding that participants were more likely to avert their gaze ( mutual break ) at the start of successful recall which imposed higher cognitive demand, suggests that eye gaze may be influenced by cognitive processing demands.

Kendrick and Holler (2017) examined eye gaze direction in relation to so-called polar questions in which you expect an affirmative “yes” or negative “no” answer and based on grammatical format of the question, the response is either preferred or dispreferred (e.g., “Can you see?”—“Yes, I can” or “Can't you see?”—“No, I can't”— preferred answer as positively formulated question receives positively formulated answer and vice versa for negative question. “Can you see?”—“No, I can't” or “Can't you see?”—“Yes, I can”— dispreferred answer because positively formulated questions receive negatively formulated answer and vice versa for negatively formulated question; see: Bolden, 2016 ; Kendrick and Holler, 2017 ). Kendrick and Holler (2017) found that 53.8% of responses to polar questions started with a speaker gazing away. However, speakers responded to the majority of preferred responses with a gaze directed at the questioner, except when giving complex responses or when taking time to think about their response, when they kept their gaze averted. Furthermore, the majority of dis-preferred listener responses were produced with gaze averted.

The latter two studies do not provide a strong case for Kendon's (1967) claim that speakers tend to gaze away at the start of the turn, with only 42% ( Novick et al., 1996 ) and 53.8% ( Kendrick and Holler, 2017 ) of turns starting with averted speaker's gaze. However, both studies suggest that in the cases when speakers avert their gaze at the start of a turn, this gaze pattern may be related to the level of cognitive processing with more complex responses requiring more planning and concentration, during which speakers avert their gaze. Additionally, Harrigan and Steffen (1983) found no supporting evidence that speakers avert their gaze at the start of the turn. The researchers analyzed eye gaze patterns of five people in a group conversation and found that 79% of the time, the speakers tended to gaze toward a listener at the start of the utterance ( Harrigan and Steffen, 1983 ).

Finally, four other studies ( Beattie, 1978 ; Rutter et al., 1978 ; Rossano et al., 2009 ; Streeck, 2014 ) concluded that their evidence did not support the suggestion that eye gaze facilitates turn taking. Instead, two of these studies Rossano et al. (2009) , Streeck (2014) argue that eye gaze is used to coordinate an initiation, formation and closure of action sequences (e.g., question-answer, request-compliance, telling-appreciation) that may take multiple turns to complete ( Rossano et al., 2009 ). More specifically, Rossano et al. (2009) investigated question-answer sequences extracted from three different cultures speaking different languages and found that only a small proportion (Tzeltal−10.7%, Yeli Dnye−12.3%, Italian−16.4%) of questions were asked by the speaker with averted gaze. In contrast, the speakers were more likely to initiate the sequence by gazing toward the recipient ( Rossano et al., 2009 ; Streeck, 2014 ). Streeck (2014) argued that speaker's gaze on the listener, whether they maintain the gaze during the question initiation or bring the gaze back from another preoccupying task (e.g., eating) during the sequence, is to indicate the salience of what is being said. Furthermore, gazing at the listener during question initiation, allows the speaker to check that the listener understands, believes and/or agrees with the intentions of the act rather than merely checking if the listener is paying attention ( Streeck, 2014 ).

Eye Gaze During Speech

In line with Kendon (1967) , several studies confirmed that eye gaze has a monitoring role during conversations ( Eberhard and Nicholson, 2010 ; Cummins, 2012 ; Jokinen et al., 2013 ; Ho et al., 2015 ). In general, listeners tend to gaze at the speaker for long periods of time ( Kendon, 1967 ; Cummins, 2012 ; Jokinen et al., 2013 ; Ho et al., 2015 ), whereas speakers gaze less often and give regular, short glances toward the listener ( Kendon, 1967 ; Eberhard and Nicholson, 2010 ; Jokinen et al., 2013 ; Ho et al., 2015 ). This phenomenon has been observed in dyadic ( Kendon, 1967 ; Eberhard and Nicholson, 2010 ; Cummins, 2012 ; Ho et al., 2015 ), triadic ( Jokinen et al., 2013 ) free flowing ( Kendon, 1967 ; Cummins, 2012 ; Jokinen et al., 2013 ), storytelling ( Eberhard and Nicholson, 2010 ) and game context ( Ho et al., 2015 ) conversations. However, an opposite effect has been observed during question-answer sequences ( Rossano et al., 2009 ; Streeck, 2014 ). For example, Rossano et al. (2009) explored gaze behavior in three different cultures and found that speakers tend to gaze toward the listener more often (Tzeltal−65.7%, Italian−79.9%, Yeli Dnye−79.9%) than listeners toward the speaker (Tzeltal−42.3%, Italian−63.3%, Yeli Dnye−67.3%).

Ijuin et al. (2018) , examined the role of eye gaze in relation to turn taking within groups of three native or non-native language speakers. This is the only study that looked at the amount of time spent gazing, rather than gaze shift patterns, to predict turn taking. They found that speakers in both native and non-native speaking groups tend to look more at the person who is likely to be the next speaker than at the observing listener, in both floor- switching and floor-holding conditions. Furthermore, speakers in both language groups looked at the next speaker more in floor-switching than floor-holding conditions ( Ijuin et al., 2018 ), suggesting that gazing ratio between three conversation partners can be used to predict the next speaker.

“Backchannels”—the short verbal and non-verbal signals used by listeners to acknowledge the speaker and to convey their understanding—were found to be elicited during mutual gaze between a listener and a speaker ( Kendon, 1967 ; Bavelas et al., 2002 ; Eberhard and Nicholson, 2010 ; Cummins, 2012 ). However, Eberhard and Nicholson (2010) found a slight difference between verbal and non-verbal backchannels and the occurrence of mutual gaze. The findings suggest that overlap between mutual gaze and backchanels were more likely to happen during listeners' non-verbal signals, (e.g., acknowledgments—head nods; and exemplifications—facial expressions; 80 and 93% respectively), than verbal signals (i.e., acknowledgments—“ok,” “mhm,” “uh huh”; exemplifications—“wow, that's crazy”; 60 and 69%, respectively). The findings suggest that verbal signals alone help to convey listener's engagement and understanding, therefore speakers are less inclined to visually check upon the listeners.

In their studies, Jokinen et al. (2009 , 2013) found that seating positions have a significant effect on how participants in triadic conversations divide their visual attention between conversation partners. They found that a participant sitting in front of two partners, divided their attention between them equally. However participants with one partner in front of them and another to the side, spent about 45% of the time gazing in the distance, 40% of the time at the partner in front of them, and only 15% of the time to the partner sitting next to them ( Jokinen et al., 2009 ). These findings suggest that seating position may mediate the effectiveness of eye gaze ratio in predicting the next speaker, where the seating arrangements are not equally distributed.

In relation to breakdowns in conversations, Goodwin (1980) found that in order to produce a coherent sentence, speakers preferred to have recipient's gaze secured. They found that during conversation breakdown, speakers restarted their sentence as a technique to request the listener for their gaze. Goodwin (1980) , suggested that in order to avoid restarts, it is preferred that the listener is gazing at the speaker when the speaker looks at the listener and not the other way around. Furthermore, the speakers also used pauses near the beginning of the sentence to delay speech until the listener's gaze was obtained ( Goodwin, 1980 ; Streeck, 2014 ). Egbert (1996) found that in all of the segments containing the repair-initiator “pardon?” (“bitte?” in German), the speaker did not share mutual gaze prior to initiation of repair. Whilst, Rossano et al. (2009) found that repair-initiating questions were often initiated with a mutual gaze between speaker and the listener. However, Goodwin's (1980) suggestion that speakers preferred to have recipient's gaze secured to prevent conversation breakdown, were not supported by Rossano et al. (2009) , who found that during 20% to 30% of questions the listener's gaze was not present and repairs were not initiated as a result. In contrast, Streeck (2014) found evidence that when the speaker's gaze was not present during the question, it led to the recipient of the question failing to respond. Blythe et al. (2018) found that during problematic next speaker selection, when the intended addressee in multiparty conversation fails to respond, problems often arose due to seating arrangements and lack of mutual gaze. Blythe et al. (2018) noted that when the addressee fails to respond, the current speaker tends to use more engagement tools than before, such as turning their head to gaze toward the addressee or make vocative reference such as calling the person's name.

A consistent pattern of averted eye gaze during hesitant speech has been found. Kendon (1967) , reported that speakers looked at the listeners around 50% of the time during fluent speech, but only 20.3% of the time during hesitant speech. Beattie (1979) found that hesitant speech which requires more planning (i.e., cognitively challenging) was associated with averted gaze. Park (2015) found that in interactions with teachers, students were more likely to use an “or-prefaced” self-repair sequence (i.e., immediately starts another turn with an “or” to give an alternative example), when teachers used dispreference signals, such as hesitation and pauses, which was often accompanied by a teacher's eye gaze shift from a mutual eye gaze. Two studies ( Eberhard and Nicholson, 2010 ; Brône et al., 2017 ) found that speakers avert their gaze away from the listener during verbal pause fillers (e.g., “uhm”), a behavior associated with speech planning Eberhard and Nicholson (2010) . More specifically, Brône et al. (2017 ) found this gaze pattern in 76% of cases and Eberhard and Nicholson (2010) reported this pattern in six out of seven speakers. Similarly, four other studies found that speakers do tend to terminate their gaze to indicate the turn hold ( Kendon, 1967 ; Bavelas et al., 2002 ; Jokinen et al., 2013 ; Zima et al., 2019 ), which happens at Transition Relevance Place (TRP: Table 2 ), during switching pauses and hesitant markers ( Jokinen et al., 2013 ; Brône et al., 2017 ). Jokinen et al. (2013) , concluded that eye gaze direction was a better predictor of turn hold than speech.

Finally, evidence suggests that eye gaze behavior tends to vary between and within conversations. For example, Cummins (2012) examined individuals' behavior in multiple dyadic conversations with different partners, and found that gaze varied between conversations, suggesting (i) eye gaze behavior is adaptive and (ii) likely to be influenced by the behavior of the conversation partner. Streeck (2014) also found that speaker's gaze toward the listener varied in frequency and duration, however unlike Cummins' study, variation was higher within rather than between conversation. It is important to note that Streeck's (2014) findings were based on one speaker's gaze rather than multiple speakers as in Cummins' (2012) study, as such, this may help to explain the difference between these studies. Rossano et al. (2009) found that amount of gaze varied based on the type of question sequence and its position in the sequence. For example, speakers tend to gaze less during request for information that mostly occur at the start of the sequence, than during request for repair and confirmation sequences that mostly occur within an initiated sequence ( Rossano et al., 2009 ). There is also evidence to suggest that amount of gaze tend to vary between cultures, with some cultures (i.e., Italian and Yeli Dnye) gazing toward the conversation partner more than in others (i.e., Tzeltal; Rossano et al., 2009 ).

Eye Gaze During Simultaneous Speech

When it comes to simultaneous speech, Schegloff (2001) noted that interruptions can be classed as problematic and unproblematic. Schegloff (2001) defined, problematic interruption is when the listener disrupts the speaker's speech with the aim of taking the floor, which prevents the other person finishing their turn. In contrast, unproblematic overlap is a short period of simultaneous speech where one speaker is finishing their turn and another is starting their turn prematurely (Schegloff). In this review, six studies ( Kendon, 1967 ; Lamb, 1981 ; Harrigan and Steffen, 1983 ; Harrigan, 1985 ; Brône et al., 2017 ; Zima et al., 2019 ) examined eye gaze behavior during problematic and unproblematic speech interruptions and identified two different situations: starting an initial interruption and prevailing the interruption once the interruption has started.

Three studies ( Kendon, 1967 ; Harrigan and Steffen, 1983 ; Brône et al., 2017 ) reported a similar gazing pattern at the start of initial interruption. Despite a small number of interruption occurrences, Kendon (1967) observed that during problematic interruption, speakers tend to stare at each other, until one prevails. Harrigan and Steffen (1983) found that interrupting speakers gazed at the listeners at the start of 90% of successful and 83% of unsuccessful problematic interruptions, and 63% of the time at the start of unproblematic overlapped speech. Brône et al. (2017 ) investigated dyadic and triadic conversations, and found that individuals wishing to interrupt the speaker, often averted their gaze prior to problematic interruption and then mostly started the interruption with a direct gaze at the interrupted speaker.

Two studies ( Harrigan, 1985 ; Zima et al., 2019 ) also found a similar eye gaze pattern that influenced one of the speakers to prevail at the interruption once the interruption had started. Harrigan (1985) examined verbal and non-verbal behavior in relation to turn-taking and found that gazing away was a strategy in prevailing at problematic interruption. Zima et al. (2019) found that during simultaneous speech, in 54.7% of cases with a mutual gaze, speakers who averted their gaze first, won the competition for a turn take, and 80.5% of these speakers completed their turn successfully, whether that was a turn-holding (problematic interruption) or turn-yielding (unproblematic overlap) scenario. Furthermore, in 62.1% of interruption cases without mutual gaze, the speaker who gazed at the other speaker, lost the competition for the turn, whereas in 75.8% of cases where the speaker avoided another speaker's gaze, won the competition.

Finally, Lamb (1981) examined gender difference regarding dominance and speaking order in same sex triads and found that females who simultaneously spoke first were more likely to avert their gaze, whereas males in the same situation tended to maintain their gaze.

Eye Gaze During Turn Yielding

In line with Kendon's (1967) study, 11 studies confirmed that in general people tend to end their turn with eye gaze directed at the next speaker ( Rutter et al., 1978 ; Harrigan and Steffen, 1983 ; Kalma, 1992 ; Novick et al., 1996 ; Lerner, 2003 ; Jokinen et al., 2009 , 2013 ; Ho et al., 2015 ; Brône et al., 2017 ; Auer, 2018 ; Blythe et al., 2018 ). Kendon (1967) reported that around 71% of speaker turns ended with a gaze toward the listener and 69% did so in Harrigan and Steffen's (1983) sample. As mentioned above, Novick et al. (1996) identified “mutual break” and “mutual hold” patterns, which between them a total of 71% of turns ended with a gaze toward next speaker. Auer's (2018) , study confirmed that people end their turn with a directed eye gaze, but highlighted that gaze is not always the dominant factor in selecting the next speaker. For example, where a speaker addresses a generic question (e.g., to identify a location) to two listeners, but only addresses one of the listeners by gaze. However, if a gaze-addressed individual is taking their time to answer, for example when they are not sure of the answer, the second, gaze-unaddressed participant, who knows the answer or gathers their thoughts faster, is likely to take the turn.

Kalma (1992) investigated the role of prolonged gaze, which was linked to participants being more dominant in triadic conversations and found that listeners who received a prolonged gaze from the speaker at the end of the utterance, were most likely to be the next speaker. Blythe et al.'s (2018) study highlighted the importance of using direct eye gaze and other engagement tools, such as head turns or vocative references in order to achieve unproblematic next speaker selection, in which the speaker selected next by the current speaker takes a turn. However, Blythe et al. (2018) also found that during non-selecting interrogative questions in multiparty conversations, in which no one is being addressed, the current speaker gazed away from all the listeners to avoid selecting the next speaker, suggesting the direction of eye gaze in turn taking can be context specific. Rutter et al.'s (1978) second study found that speakers tended to gaze at the end of the utterance more for strangers than friends. Similarly, speakers at the end of the utterance were less likely to gaze during a cooperative topic about socio-politics (i.e., held the same point of view), than a competitive topic (i.e., held opposite points of view) ( Rutter et al., 1978 ).

As mentioned before, four studies ( Beattie, 1978 ; Rutter et al., 1978 ; Rossano et al., 2009 ; Streeck, 2014 ) concluded that their evidence did not support the suggestion that eye gaze facilitates turn taking, with two studies Rossano et al., 2009 ; Streeck, 2014 claiming that eye gaze instead was used in relation to organization of action sequences that may take multiple turn to complete. Nevertheless, Streeck (2014) did find that the recipient of the question, is more likely to respond if the speaker is gazing at the recipient at the end of the question. In response to Kendon's (1967) finding that speakers tend to look at the listener at the end of the turn, Rossano et al. (2009) analyzed question-answer sequences from three different language samples and found that speakers at the end of the question very rarely broke and shifted their gaze back to the listener at the end of the turn (Tzeltal−7.7%, Yeli Dnye−5%, Italian−5% of cases). Rossano et al. (2009) argued that due to the fact that speakers tend to look at the listener throughout the question without shifting their gaze, it cannot be used as a cue for switching speaker roles. However, when it comes to the end of question-answer sequence, both the listener and the speaker indicate closure by gazing away from one another ( Rossano et al., 2009 ; Streeck, 2014 ). Similarly, Weiss (2018) who investigated eye gaze behavior in relation to turn taking in triadic conversations, reported that in the instances when no one had anything to say, the topic was closed by all three conversation partners gazing away from each other.

Eye Gaze and Unaddressed Participant

The role of unaddressed participant's eye gaze within conversation has been noted and discussed in several studies ( Kalma, 1992 ; Lerner, 2003 ; Jokinen et al., 2013 ; Holler and Kendrick, 2015 ; Zima et al., 2019 ). Kalma (1992) examined the function of “prolonged” eye gaze in relation to turn taking and found that unaddressed participants were less likely to interrupt speech, if the next speaker was selected with a prolonged gaze. Lerner (2003) investigated how speakers select the next speaker in multi-party conversations and found that problems such as speech interruption occur when an unaddressed participant does not see the speaker's intentions and takes the turn instead. Jokinen et al. (2013) noted that in triadic conversations, the observing recipient gazed at the current speaker less than the primary addressee who is being addressed by the speaker, which increased the likelihood that the primary addressee would be the next speaker. Holler and Kendrick (2015) , explored the timing in relation to turn taking from the unaddressed person's perspective in triadic conversation. They found that during question-answer sequences, the unaddressed participant most often shifted their gaze to the next speaker 50 ms prior to the anticipated end of their turn, and 40 ms prior to the first point of passible completion of question turns. Holler and Kendrick (2015) concluded that the gaze of the unaddressed participant is mostly anticipatory, but they are also sensitive to TRP cues ( Sacks et al., 1974 ). Zima et al. (2019) found that during speech interruption, in 60.2% of the cases, the unaddressed participant helped to appoint the next speaker by gazing either at the original speaker or the speaker who interrupted the speech.

Eye Gaze With Partner Unwilling to Take a Turn

Three studies ( Jokinen et al., 2009 ; Auer, 2018 ; Weiss, 2018 ) found that the gaze-selected next speaker (i.e., the current speaker is gazing at the person they expect to speak next) who either does not know how to respond or is just unwilling to take a turn can decline the offer by averting their eye gaze from the current speaker. However, this phenomenon has only been tested in triadic studies ( Jokinen et al., 2009 ; Auer, 2018 ; Weiss, 2018 ), and observed during question-answer sequences ( Auer, 2018 ; Weiss, 2018 ), so it is unclear if the same would apply to dyadic conversations and during different type of turn sequences. Weiss (2018) also found that in 56% of cases, a gaze-selected next speaker was able to pass on the turn intended for them, by redirecting their gaze to the unaddressed participant in a triad. In the instances when no one had anything to say, the topic was closed by all three conversation partners gazing away from each other ( Weiss, 2018 ).

The aim of this review was to investigate the literature on the role of eye gaze in relation to turn taking and how this had been studied over the last 50 years. Six themes describing the role of eye gaze in conversation were identified based on 26 studies carried out between 1967 and 2019. Specifically, these themes related to the function of eye gaze at the start of a turn, during conversation, during speech interruption and overlap, at the end of the turn, eye gaze from the view of an unaddressed participant, and finally the role of eye gaze when a participant is unwilling to take a turn.

During conversation, people use eye gaze to monitor each other's availability, reactions and emotions ( Kendon, 1967 ; Eberhard and Nicholson, 2010 ; Cummins, 2012 ; Jokinen et al., 2013 ; Ho et al., 2015 ). Listeners tend to gaze at the speaker more and for longer periods to show their interest, whereas speakers tend to gaze at the listeners more frequently, but for a shorter period of time to monitor listener's focus of attention. In support, Argyle and Dean (1965) , Argyle et al. (1973) argued that people who were able to see their conversation partners spent more time looking at them to seek additional information, than the participants who were unable to see their conversation partners but knew the location they were seated. In other words, given the opportunity, people prefer to observe their interaction partner. However, the fact that speakers spend less time gazing suggests that direct eye gaze may be distracting for the speaker and averted eye gaze may be needed for continuous speech planning or perhaps to avoid being interrupted by the listener by showing their unavailability. Another two studies ( Rossano et al., 2009 ; Streeck, 2014 ) found an opposite effect that during question-answer sequences speakers tend to gaze more toward the listener than other way around. One explanation for this may be to do with the fact that studies differed in types of conversation they analyzed. As reported in Rossano (2012) thesis, gaze behavior of the listener tends to differ when listening to stories vs. simple questions. However, Ho et al. (2015) also analyzed question-answer sequences, but the results supported Kendon's claim. Ho et al. (2015) study was conducted in a game context rather than free flowing conversation, carried out in a laboratory setting rather than in a natural environment and analyzed using statistical method rather than CA. This highlights that gaze behavior is not straightforward and is influenced by a combination of factors. Furthermore, monitoring each other during conversation is important for coherent conversation, as the presence of mutual gaze helps to avoid conversation breakdowns ( Goodwin, 1980 ; Egbert, 1996 ; Blythe et al., 2018 ) and is often used to restore the breakdowns ( Rossano et al., 2009 ; Streeck, 2014 ). In addition, monitoring each other during conversation helps to prompt backchannels ( Kendon, 1967 ; Bavelas et al., 2002 ; Eberhard and Nicholson, 2010 ; Cummins, 2012 ), which are used to show listener's understanding and focus of attention, and also help the speaker to tell a story with more enthusiasm, dramatic endings and without repetition ( Bavelas et al., 2000 , 2002 ; Bertrand et al., 2007 ).

A specific eye gaze pattern has been observed during speech interruptions when the listener disrupts the speaker's speech with the aim to take the floor ( Kendon, 1967 ; Harrigan and Steffen, 1983 ; Brône et al., 2017 ). The mutual eye contact ( Kendon, 1967 ) or gazing at the interrupted speaker at the initial start of the simultaneous speech ( Kendon, 1967 ; Harrigan and Steffen, 1983 ; Brône et al., 2017 ) may function as a way to check the conversation partner's reaction. Whereas looking away to break the mutual gaze once the interruption has started, to win over the turn, may signal that person's unavailability to accept further information from the other speaker ( Harrigan, 1985 ; Zima et al., 2019 ) and signal commitment to speech planning ( Glenberg et al., 1998 ). In fact, shifting one's eye gaze at TRP from the conversation partner has also been linked to a floor-holding strategy when people need time to gather their thoughts of what they going to say next ( Kendon, 1967 ; Bavelas et al., 2002 ; Jokinen et al., 2013 ; Zima et al., 2019 ). Overall, there seems to be similar eye gaze behavior of looking away, whether that is when aiming to hold the floor to continue talking or interrupting speech to start an abrupt turn, all of which are likely linked to the cognitive processes involved in speech planning or to indicate speaker's unavailability to receive a response from a listener.

Furthermore, eye gaze behavior of the unaddressed participants in multiparty conversations appears to play a large role in monitoring and managing conversations ( Jokinen et al., 2009 ; Auer, 2018 ; Weiss, 2018 ), by contributing to a prevention of simultaneous speech or by helping to solve a dispute between two conversation partners who speak simultaneously in competing for a turn ( Lerner, 2003 ; Zima et al., 2019 ). By monitoring conversations, the unaddressed participants are able to perceive each partners' intentions and help to keep the conversation going smoothly. However, the unaddressed participant who is not paying full attention to conversation partners, can equally be the ones interrupting the speech ( Lerner, 2003 ). In addition, evidence suggest that unaddressed participants are able to anticipate the end of the turn and tend to shift their gaze to the next speaker prior to the end of the turn, or at least in question-response sequences ( Holler and Kendrick, 2015 ). This provide evidence that unaddressed participants are in tune to listen out for TRP cues in order to ensure a smooth transition between the speakers ( Holler and Kendrick, 2015 ).

When it comes to the end of the turns, the studies reported in this review, strongly support Kendon's (1967) findings that individuals are likely to look at their conversation partner at the end of the turn ( Kendon, 1967 ; Rutter et al., 1978 ; Harrigan and Steffen, 1983 ; Kalma, 1992 ; Novick et al., 1996 ; Lerner, 2003 ; Jokinen et al., 2009 , 2013 ; Ho et al., 2015 ; Brône et al., 2017 ; Auer, 2018 ; Blythe et al., 2018 ; Streeck, 2014 ) to check the next speaker's availability and to signal turn yielding. Direct eye gaze at the end of the turn is especially important in multiparty conversations, as direct eye gaze is often used to select the next speaker ( Blythe et al., 2018 ). However, Rossano et al. (2009) argued that a claim that participants return their gaze to the listener as a way to invite them to take a turn does not apply to their findings of question-answer sequences, because the speaker tends to look at the listener throughout the question without shifting their gaze. As such eye gaze cannot be used as a cue for switching speaker roles. However, one could argue the fact that the speaker asking a question, with raised intonation at the end ( Duncan, 1972 ), while gazing toward the recipient, is a cue for them to take the floor.

The findings relating to eye gaze behavior relating to the start of the turns are less consistent. Kendon's (1967) claim that speakers tend to avert their gaze at the start of the turn has been supported by three studies ( Jokinen et al., 2009 ; Cummins, 2012 ; Ho et al., 2015 ) and disputed by five ( Beattie, 1978 ; Rutter et al., 1978 ; Harrigan and Steffen, 1983 ; Rossano et al., 2009 ; Streeck, 2014 ). It is difficult to pinpoint one reason for different findings, as there are multiple methodological factors which could have influenced the results. However, one interesting point is that two studies ( Rossano et al., 2009 ; Streeck, 2014 ) claimed that rather than facilitating turn taking as such, eye gaze instead plays a role in the organization of action sequences. This alternate approach to the role of eye gaze was reported more than a decade ago, but has not been studied as extensively as Kendon's original claim and there are still many questions to answer. For example: what role does eye gaze have in relation to other types of action sequences? How does it function in triadic conversation settings? what role does the unaddressed participant have? and would the findings be the same if the study was conducted in a controlled laboratory environment without interfering activities and objects? It appears that eye gaze does play a role in communication, however, as reported in this review, not all turns, nor all question-answer sequences started or ended with a predicted gaze, suggesting that other factors may also contribute. Future studies, may benefit from analyzing and reporting on those specific cases to determine what influenced speaker and listener behavior.

Harrigan and Steffen (1983) also found different results from Kendon, however they were the only researchers reported in this review attempting to study eye gaze behavior in a larger group situation. Future studies should explore eye gaze behavior in different group sizes to see how it differs with increasing number of participants. The findings of two other studies are indecisive regarding Kendon's (1967) claim, as only 42% ( Novick et al., 1996 ), and 53.8% ( Kendrick and Holler, 2017 ) of the time the speakers averted their gaze at the start of the turn. However, both studies ( Novick et al., 1996 ; Kendrick and Holler, 2017 ) indicated that averted eye gaze at the start of the utterance is linked to cognitive processing of speech, with aversion linked to more complex cognitive demands. This interpretation is in line with Glenberg et al. (1998) findings that individuals answering difficult questions requiring more cognitive processing were more likely to avert their gaze before responding. Furthermore, individuals who were instructed to fixate their gaze during a difficult recall task, performed worse than participants who were able to avert their gaze ( Morales et al., 2000 ). This cognitive processing idea may help to explain some predicted gaze discrepancies within the literature, as perhaps the easier questions and/or responses does not need much planning and concentration.

Study Design

Kendon (1978) originally proposed that different findings in relation to gaze and turn taking, may be due to difference in study designs. In this current review, no two studies used the same design and many lacked the essential details needed for replication or comparison with other studies. First and most importantly, the majority of studies reviewed failed to provide their definition of “eye gaze.” The few studies that did, gave different definitions. For example, Rutter et al. (1978) defined eye gaze as “looking” behavior that caused a “face-reaction.” This definition is quite vague as it does not explain different looking behavior or what “face-reaction” means. Jokinen et al. (2013) defined “gaze event” as a focus of visual attention but failed to define what exactly “focus” means and how long it lasts. Brône et al. (2017 ) defined “gazed” as fixation of anything longer than 0.12 s. Ijuin et al. (2018) defined gaze as “visual attention longer than 0.2 s.” In addition, Bavelas et al. (2002) reported that visual scanning lasts around 0.025 to 0.35 s and gaze fixation is longer than 1 s. It is unclear what time restrictions other studies used to define gaze. The continued existence of such variations is somewhat surprising given that back in 1978, Beattie (1978) criticized Kendon for not defining the key variables in his 1967 study and was the first person to define gaze in relation to duration. However, despite this information being available, only a handful of studies have taken this into consideration.

A failure to clearly define eye gaze and the use of different definitions in different studies both contribute to different studies reporting different findings. These differences extend to definitions of turns and types of exchange sequences ( Table 5 ). For example, the earlier, pre-1980 studies were more likely to use their own definitions such as utterances longer than 5 s ( Kendon, 1967 ) or utterances longer than 10 words that ended with a floor change ( Rutter et al., 1978 ). Some later studies adopted the better-defined approaches that used TRP and exchange sequences to define turns (e.g., Sacks et al., 1974 ; Stivers and Enfield, 2010 ; Table 5 ). Hence, the findings from studies that used different definitions or turn sequences, may not generalize to other types of turns. Furthermore, as mentioned above, Rossano (2012) analysis of dyadic conversations revealed that participants' eye gaze behavior was most likely to be influenced by sequential organization of the turns proposed by Schegloff and Sacks (1973) , Schegloff (1972) , Schegloff (1997) , Sacks et al. (1974) . For example, listeners' eye gaze patterns are different when they are listening to simple questions, instructions or remarks than when they are listening to extended stories. This may further explain the reason why not all turns and sequences start or end with a predicted eye gaze pattern ( Rossano, 2012 ).

The majority of pre-2002 studies tended to use same-sex only conversation, with mixed sex conversations coming in later studies. Some studies used a selection of same sex and mixed conversations, however only two studies directly compared gender. Rutter et al. (1978) found no difference in eye gaze behavior between male only and female only conversations, whereas Lamb (1981) found that females who simultaneously spoke first were more likely to avert their gaze, while males tended to maintain their initial gaze. It is surprising that the majority of studies reviewed here ignored gender difference in eye gaze patterns when designing their studies, as these were documented in several early papers ( Argyle and Dean, 1965 ; Myszka, 1975 ; Bissonnette, 1993 ). Myszka (1975) noted that during interviews, female interviewees maintained eye contact more than males. Furthermore, participants in same sex interviews, appeared more anxious than participants in mixed sex interviews, which resulted in low levels of eye contact. Bissonnette (1993) found that females shared more mutual gaze in the same sex dyads than males did, suggesting female preference for a higher level of intimacy.

Referring back to Beattie's (1978 , 1979) studies, the analyzed dyads were male only and consisted of status-influenced conversations between students and supervisors. This closely resembles Myszka's (1975) study in which male only dyads were also influenced by status between interviewer and interviewee, reporting low levels of eye gaze. As mentioned before, the social status of one conversation partner affects eye gaze behavior of the other conversation partner, such that people with a high status tend to be observed more often and for a longer periods of time, than people with a lower status ( Foulsham et al., 2010 ). As such, Beattie's (1978 , 1979) may have been influenced by both gender and status factors that resulted in non-significant results. However, Beattie's study outcomes should not be dismissed, even if conversant's status or a combination of variables prove to change eye gaze behavior, as it still adds knowledge on how different variables within conversations change eye gaze behavior. Future research should investigate eye gaze behavior in relation to turn taking in conversations influenced by status differences between conversation partners.

Age and participants' ethnic background are two further variables that studies in general did not report. To our knowledge, there is no supporting literature that eye gaze patterns during turn taking tend to vary in the adult population based on age, but it would be a useful information to use to generalize results or to compare them. Participants' ethnic background is another important variable, as there is evidence that eye gaze patterns can differ across ethnic groups ( LaFrance and Mayo, 1976 ) and cultures ( Li, 2004 ; McCarthy et al., 2006 ; Rossano et al., 2009 ). LaFrance and Mayo (1976) found that black individuals spent less time looking at conversation partners during listening, but more time during speaking, and an opposite effect was found for white participants. Li (2004) found that in Canadian/Canadian dyadic conversation, participants were gazing at their partner more often and for longer periods of time, than participants in Chinese/Chinese dyadic conversation. McCarthy et al. (2006) reported that when having to think about cognitively demanding mathematical, verbal or spatial questions, Japanese participants were most likely to look down, whereas Canadians and Trinidadians were most likely to look up. However, when answering easy questions, the Japanese participants again were most likely to avert their gaze, whereas Trinidadians most often gazed directly and maintain mutual contact. The studies in this review were conducted in a variety of languages ( Table 4 ). Whilst the results confirm similar findings across some of these languages (e.g., English, German, Dutch, Japanese, and Australian Aboriginal), Rossano et al. (2009) found some cultural gaze variation in a sample studying Italian, Tzeltal and Yeli Dnye languages. Therefore, it is not possible to confirm that gaze behavior would generalize across all languages and cultures.

Only four of the reviewed studies ( Kendon, 1967 ; Beattie, 1979 ; Lamb, 1981 ; Novick et al., 1996 ) reported physical distance between participants, and these all differed, ranging from three to six feet. Other studies either did not report any information or stated that participants sat across a table, even though evidence that distance between conversation partners has an effect on eye gaze behavior was an early finding ( Argyle and Dean, 1965 ; Argyle et al., 1973 ). Argyle et al. (1973) found that people sitting two feet apart felt most uncomfortable and shared the least amount of eye contact and that eye contact increased with distance. This was most prominent for the opposite sex pairs, suggesting that eye gaze is a cue for intimacy ( Argyle and Dean, 1965 ; Argyle et al., 1973 ). This is a very important variable that the majority studies reviewed here did not report.

Another potentially influential factor mentioned earlier, is conversation topic. Studies in this review reported free flowing conversations, memory recall tasks and discussion on a specific given subject, all of which differed in speech complexity and required different levels of cognitive processing. Early evidence suggested that when participants are asked very personal questions about their fears and desires, they are more likely to avoid mutual gaze with an interviewer than during non-personal questions ( Exline et al., 1965 ). Further evidence suggests that eye gaze behavior differs during cooperative and competitive conversations, with speakers at the end of the turn gazing less during a cooperative, than competitive topic ( Rutter et al., 1978 ) or more likely to avert their gaze when answering more difficult than and easy questions ( Glenberg et al., 1998 ). Kendon (1967) also noted that the amount of mutual gaze tends to decrease with an increase of high emotion (i.e., smiling) during conversations. These are interesting findings and future studies could benefit from reporting the general mood of participants or the tone of conversation for further analyses or comparison between studies. In the current review, the authors were unable to compare the studies based on the topics, because most studies only reported that the conversation was free flowing, or the studies that used specific topics did not specify what the tone of those conversations were. Furthermore, the decision making behind selecting the topic and type of conversation was mainly missing in the papers reviewed.

Acquaintance status (i.e., known or unknown conversants) is another design decision which was frequently not explained or fully explored. Strongman and Champness (1968) found that unacquainted participants that shared positive mutual affiliation spent significantly more time speaking with direct gaze at the partner. Rutter et al. (1978) concluded that acquainted participants gaze less at the end of utterances, whereas Bissonnette (1993) noted that friends in general gazed at each other more, that unacquainted participants. Further evidence suggests that couples who are in love, gaze at each other more ( Rubin, 1970 ). It appears that eye gaze patterns may differ based on a level of affiliation between conversation partners ( Strongman and Champness, 1968 ; Rubin, 1970 ), but other factors can also influence this.

There was also lack of consistency and reporting regarding sample sizes and effects. The quantitative studies we reviewed reported sample sizes between 5 and 69 participants and also different lengths of video segments ranging from 2 min to 1 h. However, none of these studies explained their reasons for their sample size or more importantly reported any effect size of their findings. Among qualitative studies, some did not report the number of participants or how many examples were analyzed to reach their results. The majority of the quantitative studies reported reliability scores, which mostly were highly reliable. However, none of the qualitative and only one mixed design study conducted reliability checks. Whilst qualitative studies do not use statistical methods to establish reliability, there are a variety of ways to enhance trustworthiness of study findings, such as discussing and seeking agreement with another person ( Noble and Smith, 2015 ). Therefore, it is possible that qualitative studies reported here could be influenced by subjective bias.

The studies summarized here also differed in approaches they chose to analyze the data, with a majority of studies opting for a statistical approach (i.e., quantitative studies) where they analyzed number of gaze occurrences at the start and the end of the turns. Others opted for the CA approach (i.e., all qualitative and mixed method studies), which looks at the bigger picture of interaction, by taking human actions and social context into consideration ( Goodwin and Heritage, 1990 ). Both approaches have their strengths and weaknesses. For example, the statistical approach allows the researchers to explore specific hypotheses and objectively analyze data using statistical methods. However, a statistical approach does not allow much exploration beyond the hypothesis ( Queirós et al., 2017 ). In contrast, CA does not focus on specific predictions but explores the subject by taking overall context into consideration ( Van Tam, 2016 ; Queirós et al., 2017 ). When it comes to studying language, context is very important to understand real meanings and intentions ( Van Tam, 2016 ). However, the CA method is prone to subjectivity bias to researcher's point of view ( Queirós et al., 2017 ).

Another potentially influential factor to consider is the setting of the studies. Most studies reported here ( Table 3 ) were conducted in a controlled laboratory setting, with limited distractions. In contrast, the ecological studies ( Table 3 ) were likely influenced by different seating arrangements, participants moving around, handling objects, or being distracted by environmental factors (e.g., Goodwin, 1980 ; Rossano et al., 2009 ; Blythe et al., 2018 ; Streeck, 2014 ). Again, both approaches have their strengths and weaknesses. The laboratory-based studies were able to control influential factors resulting in arguably more concrete results. However, these findings do not necessarily generalize directly to real life scenarios. In contrast, the ecological studies explored gaze behavior in a natural setting. However, the uncontrolled environmental variables may have affected gaze behavior, resulting in different conclusions.

In terms of coding, 16 studies reviewed here, coded their data manually, that is eye gaze direction was determined by the researchers. The remaining ten, mainly post-2003 studies, used eye tracking devices ( Table 4 ), which is likely to be more accurate, as it is not influenced by interpretation bias. However, eye tracking studies are susceptible to data loss due to technical and calibration issues, which was observed in a few studies review here ( Jokinen et al., 2009 , 2013 ; Ho et al., 2015 ; Holler and Kendrick, 2015 ; Auer, 2018 ). Furthermore, wearing an eye tracking device may influence eye gaze behavior, as participants are aware that the study is looking into eye gaze behavior even if the researchers did not inform them of the true intentions. Another issue is regarding accuracy of coding mutual gaze. It is difficult to determine a precise place on the face where participants were looking (i.e., eyes, lips, forehead) when coding was done manually. However, eye tracking studies have precise information but often did not to report if mutual gaze was the only measure when participants were looking each other directly in the eye, or whether slight deviation from the eye region was also coded as mutual gaze. Either way, it is possible that there may be some discrepancies in results between manually coded and eye tracked studies.

This review has identified a variety of methodological approaches that are likely to affect eye gaze behavior in communication. The review is unable to provide a set of specific design guidelines for future studies on eye gaze behavior in communication, as these would very much depend on the research question and resources available. However, the authors would like to highlight the importance of clearly defining the study variables, such as eye gaze, speech turns or action sequences, to allow for easier comparison. The authors recommend defining gaze in terms of minimum gaze fixation in milliseconds (e.g., 66 ms— Eberhard and Nicholson, 2010 , 120 ms— Zima et al., 2019 , 200 ms— Ijuin et al., 2018 ) and explaining the reasons for chosen fixation duration. When choosing ways to define speech, the authors recommend applying a more developed approaches such as CA ( Schegloff, 1972 ; Schegloff and Sacks, 1973 ; Sacks et al., 1974 ; Stivers and Enfield, 2010 ) or MUMIN ( Allwood et al., 2007 ). Adopting these definitions would provide consistency in future research. As observed in this review, failing to report the key information in qualitative studies is a common phenomenon that makes evaluation of the literature difficult ( O'Brien et al., 2014 ). The authors would recommend that qualitative studies provide more details about their methodological approach as often it was unclear what population or in what situations the results could be applied. Future studies could use this review as a guide on what key variables should be considered in the design stage and reported in publications.

As for this review itself, it is important to note, that although the formation of the research question and defining of research terms were carried out by both authors, the searching and data extraction were completed by the first author which may have introduced unintentional bias. This was addressed by checking and validating the emerging themes through discussion of the evidence from the reviewed studies. The readers should be aware that the review was done by searching only two databases and has only included literature written in the English language. As such this may have implications for the reported results, as other important publications may have been missed out, as demonstrated by additional papers identified during the reference scan. Furthermore, the review excluded papers that explored gaze behavior in a clinical population, which likely missed results reported from healthy control groups (as these were excluded by the search terms). However, it is hoped that the findings can be of use for researchers working with clinical and non-clinical populations in developing their research questions and methodologies. The review is also limited to a healthy adult population, so it is unclear if the same eye gaze patterns would follow with children, teenagers or people with mental health conditions.

Conclusions

In conclusion, there is a clear evidence that eye gaze plays a role in communication, whether that is in each turn during speech or in relation to exchange sequences that take multiple turns to complete. More specifically, there is strong support that eye gaze facilitates turn yielding, plays a role in speech monitoring, prevents and repairs conversation breakdowns and facilitates intentional and unintentional speech interruptions. However, when it comes to starting a turn, the results are somewhat more variable with several modifiers that influence gaze behavior. Kendon (1978) argued that the difference between his (1967) study and studies carried out by Rutter et al. (1978) and Beattie (1978) , was a product of different study designs. The studies summarized here used a wide range of methodologies, frequently failing to present what motivated their design decision-making, yet the majority reported similar study outcomes. Whilst there is a lot of evidence to suggest eye gaze plays a role in regulating conversations, it must not be forgotten that other signals such, as intonation and gestures ( Duncan, 1972 ) may also help to inform the next speaker of their turn. Jones and LeBaron (2002) pointed out that much of the research on communication concentrates on verbal and non-verbal behaviors separately and suggested these should be studied as related phenomena. Future studies should learn from the work conducted over the past 50 years to avoid (i) repetition and (ii) guide their methodological decision-making. Particularly important is to agree definitions of key variables (i.e., eye gaze, turn) for easy comparison and all methodological decisions (e.g., dyad or triad, conversation topic, etc.) should be justified. This review provides a good starting point for future studies to understand the basics of eye gaze in turn taking, make informed decisions about study methods for examining eye gaze and selecting variables of interest.

Author Contributions

ZD and AA formulated the research question, defined research terms, validated the emerging themes through discussion of the evidence from the reviewed studies, and agreed on the final version of the manuscript. ZD completed the searching and data extraction, wrote the initial manuscript, and prepared tables. AA provided critical feedback and made revisions to the manuscript. All authors contributed to the article and approved the submitted version.

This work was supported by Samsung AI Center, Cambridge. The funder was not involved in the study design, collection, analysis, interpretation of data or the writing of this article.

Conflict of Interest

ZD and AA were employed by Samsung AI Center, Cambridge, at the time of this research.

Adams, R. B Jr., and Kleck, R. E. (2005). Effects of direct and averted gaze on the perception of facially communicated emotion. Emotion 5:3. doi: 10.1037/1528-3542.5.1.3

PubMed Abstract | CrossRef Full Text | Google Scholar

Admoni, H., and Scassellati, B. (2017). Social eye gaze in human-robot interaction: a review. J. Hum. Robot Interaction . 6, 25–63. doi: 10.5898/JHRI.6.1.Admoni

CrossRef Full Text | Google Scholar

Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., and Paggio, P. (2007). The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. Lang. Resour. Eval. 41, 273–287. doi: 10.1007/s10579-007-9061-5

Argyle, M. (1967). The Psychology of Interpersonal Behaviour . Harmondsworth: Penguin.

Google Scholar

Argyle, M., and Dean, J. (1965). Eye-contact, distance and affiliation. Sociometry 289–304. doi: 10.2307/2786027

Argyle, M., Ingham, R., Alkema, F., and McCallin, M. (1973). The different functions of gaze. Semiotica 7, 19–32. doi: 10.1515/semi.1973.7.1.19

Auer, P. (2018). Gaze, addressee selection and turn-taking in three-party interaction. Eye-Track. Interact. Studies Role Eye Gaze Dialogue . 10:197. doi: 10.1075/ais.10.09aue

Baron-Cohen, S., (ed.). (1997). “How to build a baby that can read minds: cognitive mechanisms in mindreading,” in The Maladapted Mind: Classic Readings in Evolutionary Psychopathology (East Sussex: Psychology Press), 207–239.

Bavelas, J. B., and Chovil, N. (2000). Visible acts of meaning: An integrated message model of language in face-to-face dialogue. J. Lang. Soc. Psychol . 19, 163–194. doi: 10.1177/0261927X00019002001

Bavelas, J. B., Coates, L., and Johnson, T. (2000). Listeners as co-narrators. J. Pers. Soc. Psychol . 79, 941. doi: 10.1037/0022-3514.79.6.941

Bavelas, J. B., Coates, L., and Johnson, T. (2002). Listener responses as a collaborative process: the role of gaze. J. Commun . 52, 566–580. doi: 10.1111/j.1460-2466.2002.tb02562.x

Beattie, G. W. (1978). Floor apportionment and gaze in conversational dyads. Br. J. Soc. Clin. Psychol . 17, 7–15. doi: 10.1111/j.2044-8260.1978.tb00889.x

Beattie, G. W. (1979). Contextual constraints on the floor-apportionment function of speaker-gaze in dyadic conversations. Br. J. Soc. Clin. Psychol . 18, 391–392. doi: 10.1111/j.2044-8260.1979.tb00909.x

Bertrand, R., Ferré, G., Blache, P., Espesser, R., and Rauzy, S. (2007). “Backchannels revisited from a multimodal perspective,” in Proceedings of Auditory-Visual Speech Processing (Hilvarenbeek).

Birdwhistell, R. (1978). Kinesics and Context. Philadelphia, PA: University of Pennsylvania Press.

Bissonnette, V. L. (1993). Interdependence in dyadic gazing (Doctoral dissertation), The University of Texas at Arlington, Arlington, VA. Retrieved from: https://search.proquest.com/docview/304055260?accountid=13460andpq-origsite=summon

Blythe, J., Gardner, R., Mushin, I., and Stirling, L. (2018). Tools of engagement: selecting a next speaker in australian aboriginal multiparty conversations. Res. Lang. Soc. Interact . 51, 145–170. doi: 10.1080/08351813.2018.1449441

Boersma, P., and Weenink, D. (2010). Praat: Doing Phonetics by Computer (version 5.2) . Retrieved from: http://www.praat.org/ (accessed February 6, 2020).

Bolden, G. B. (2016). A simple da? Affirming responses to polar questions in Russian conversation. J. Pragmat . 100, 40–58. doi: 10.1016/j.pragma.2015.07.010

Brône, G., Oben, B., Jehoul, A., Vranjes, J., and Feyaerts, K. (2017). Eye gaze and viewpoint in multimodal interaction management. Cogn. Linguist . 28, 449–483. doi: 10.1515/cog-2016-0119

Carletta, J. (2006). Announcing the AMI meeting corpus. The ELRA Newsletter . 11, 3–5.

Chafe, W. (1994). Discourse, Consciousness, and Time: The Flow and Displacement of Conscious Experience in Speaking and Writing . Chicago, IL: University of Chicago Press.

Choi, Y. S., Gray, H. M., and Ambady, N. (2005). “The Glimpsed World: unintended communication and unintended perception,” in The New Unconscious , eds R. R. Hassin, J. S. Uleman, and J. A. Bargh (Oxford: Oxford University Press), 309–333.

Cook, G. (1990). Trancribing infinity: problems of context presentation. J. Pragmat . 14, 1–24. doi: 10.1016/0378-2166(90)90061-H

Cummins, F. (2012). Gaze and blinking in dyadic conversation: a study in coordinated behaviour among individuals. Lang. Cogn. Process . 27, 1525–1549. doi: 10.1080/01690965.2011.615220

Dowiasch, S., Backasch, B., Einhäuser, W., Leube, D., Kircher, T., and Bremmer, F. (2016). Eye movements of patients with schizophrenia in a natural environment. Europ. Arch. Psychiatry Clin. Neurosci. 266, 43–54. doi: 10.1007/s00406-014-0567-8

Duncan, S. (1972). Some signals and rules for taking speaking turns in conversations. J. Pers. Soc. Psychol . 23, 283. doi: 10.1037/h0033031

Duncan, S. D. (1983). “Speaking turns: studies of structure and individual differences,” in Nonverbal Interaction , eds J. M.Wiemann and R. P. Harrison (Beverley Hills, CA: Sage), 149–178.

Eberhard, K., and Nicholson, H. (2010). “Coordination of understanding in face-to-face narrative dialogue,” in Proceedings of the Annual Meeting of the Cognitive Science Society , Vol. 32.

Egbert, M. M. (1996). Context-sensitivity in conversation: Eye gaze and the German repair initiator bitte? Lang. Soc . 25, 587–612. doi: 10.1017/S0047404500020820

Exline, R., Gray, D., and Schuette, D. (1965). Visual behavior in a dyad as affected by interview content and sex of respondent. J. Pers. Soc. Psychol . 1:201. doi: 10.1037/h0021865

Farroni, T., Csibra, G., Simion, F., and Johnson, M. H. (2002). Eye contact detection in humans from birth. Proc. Natl. Acad. Sci. U.S.A. 99, 9602–9605. doi: 10.1073/pnas.152159999

Feldstein, S., Alberti, L., and BenDebba, M. (1979). “Self-attributed personality characteristics and the pacing of conversational interaction,” in Of Speech and Time: Temporal Speech Patterns in Interpersonal Contexts , eds A. W. Siegman and S. Feldstein (New York, NY: John Wiley), 73–87.

Foulsham, T., Cheng, J. T., Tracy, J. L., Henrich, J., and Kingstone, A. (2010). Gaze allocation in a dynamic situation: effects of social status and speaking. Cognition 117, 319–331. doi: 10.1016/j.cognition.2010.09.003

Fries, C. C. (1952). The structure of English. New York, NY: Harcourt, Brace & World.

Fry, D. B. (1975). Simple reaction-times to speech and non-speech stimuli. Cortex 11, 355–360. doi: 10.1016/S0010-9452(75)80027-X

Glenberg, A. M., Schroeder, J. L., and Robertson, D. A. (1998). Averting the gaze disengages the environment and facilitates remembering. Mem. Cognit . 26, 651–658. doi: 10.3758/BF03211385

Goffman, E. (1963). Behaviour in Public Places . New York, NY: Free Press.

Goodwin, C. (1980). Restarts, pauses, and the achievement of a state of mutual gaze at turn-beginning. Sociol. Inquiry 5, 272–302. doi: 10.1111/j.1475-682X.1980.tb00023.x

Goodwin, C., and Heritage, J. (1990). Conversation analysis. Ann. Rev. Anthropol. 19, 283–307. doi: 10.1146/annurev.an.19.100190.001435

Grant, M. J., and Booth, A. (2009). A typology of reviews: an analysis of 14 review types and associated methodologies. Health Inform. Librar. J . 26, 91–108. doi: 10.1111/j.1471-1842.2009.00848.x

Harrigan, J. A. (1985). Listeners' body movements and speaking turns. Communic. Res . 12, 233–250. doi: 10.1177/009365085012002004

Harrigan, J. A., and Steffen, J. J. (1983). Gaze as a turn-exchange signal in group conversations. Br. J. Soc. Psychol . 22, 167–168. doi: 10.1111/j.2044-8309.1983.tb00578.x

Heritage, J. (2010). “Questioning in medicine,” in Why Do You Ask? The Function of Questions in Institutional Discourse , eds A. F. Freed and S. Ehrlich (Oxford: Oxford University Press), 42–68.

Herrmann, E., Call, J., Hernández-Lloreda, M. V., Hare, B., and Tomasello, M. (2007). Humans have evolved specialized skills of social cognition: the cultural intelligence hypothesis. Science 317, 1360–1366. doi: 10.1126/science.1146282

Ho, S., Foulsham, T., and Kingstone, A. (2015). Speaking and listening with the eyes: gaze signaling during dyadic interactions. PLoS ONE . 10:e0136905. doi: 10.1371/journal.pone.0136905

Holler, J., and Kendrick, K. H. (2015). Unaddressed participants' gaze in multi-person interaction: optimizing recipiency. Front. Psychol . 6, 1–14. doi: 10.3389/fpsyg.2015.00098

Ijuin, K., Umata, I., Kato, T., and Yamamoto, S. (2018). Difference in eye gaze for floor apportionment in native-and second-language conversations. J. Nonverbal Behav . 42, 113–128. doi: 10.1007/s10919-017-0262-3

Jefferson, G. (1973). A case of precision timing in ordinary conversation: Overlapped tag-positioned address terms in closing sequences. Semiotica 9, 47–96. doi: 10.1515/semi.1973.9.1.47

Jokinen, K., Furukawa, H., Nishida, M., and Yamamoto, S. (2013). Gaze and turn-taking behavior in casual conversational interactions. ACM Trans. Interact. Intelligent Syst. (TiiS) . 3:12. doi: 10.1145/2499474.2499481

Jokinen, K., Nishida, M., and Yamamoto, S. (2009). “Eye-gaze experiments for conversation monitoring,” in Proceedings of the 3rd International Universal Communication Symposium (Tokyo: ACM), 303–308. doi: 10.1145/1667780.1667843

Jones, S. E., and LeBaron, C. D. (2002). Research on the relationship between verbal and non-verbal communication: emerging integrations. J. Commun . 52, 499–521. doi: 10.1111/j.1460-2466.2002.tb02559.x

Kalma, A. (1992). Gazing in triads: A powerful signal in floor apportionment. Br. J. Soc. Psychol . 31, 21–39. doi: 10.1111/j.2044-8309.1992.tb00953.x

Kendon, A. (1967). Some functions of gaze-direction in social interaction. Acta Psychol. (Amst) . 26, 22–63. doi: 10.1016/0001-6918(67)90005-4

Kendon, A. (1972). Some relationships between body motion and speech. Studies Dyadic Commun . 7:90. doi: 10.1016/B978-0-08-015867-9.50013-7

Kendon, A. (1978). Looking in conversation and the regulation of turns at talk: A comment on the papers of G. Beattie and DR Rutter et al. Br. Psychol. Soc . 17, 23–24. doi: 10.1111/j.2044-8260.1978.tb00891.x

Kendrick, K. H., and Holler, J. (2017). Gaze direction signals response preference in conversation. Res. Lang. Soc. Inter . 50, 12–32. doi: 10.1080/08351813.2017.1262120

Kipp, M. (2001). “Anvil-a generic annotation tool for multimodal dialogue,” in Seventh European Conference on Speech Communication and Technology (Aalborg).

Kleinke, C. L. (1986). Gaze and eye contact: a research review. Psychol. Bull. 100:78. doi: 10.1037/0033-2909.100.1.78

LaFrance, M., and Mayo, C. (1976). Racial differences in gaze behavior during conversations: two systematic observational studies. J. Pers. Soc. Psychol . 33:547. doi: 10.1037/0022-3514.33.5.547

Lamb, T. A. (1981). Nonverbal and paraverbal control in dyads and triads: Sex or power differences? Soc. Psychol. Q . 44, 49–53. doi: 10.2307/3033863

Lazarov, A., Suarez-Jimenez, B., Tamman, A., Falzon, L., Zhu, X., Edmondson, D. E., et al. (2019). Attention to threat in posttraumatic stress disorder as indexed by eye-tracking indices: a systematic review. Psychol. Med . 49, 705–726. doi: 10.1017/S0033291718002313

Lerner, G. H. (2003). Selecting next speaker: The context-sensitive operation of a context-free organization. Lang. Soc . 32, 177–201. doi: 10.1017/S004740450332202X

Lerner, G. H. (2004). Conversation Analysis: Studies From the First Generation (Vol. 125). Amsterdam: John Benjamins Publishing. doi: 10.1075/pbns.125

Li, H. Z. (2004). Culture and gaze direction in conversation. RASK . 20, 3–26.

McCarthy, A., Lee, K., Itakura, S., and Muir, D. W. (2006). Cultural display rules drive eye gaze during thinking. J. Cross Cult. Psychol . 37, 717–722. doi: 10.1177/0022022106292079

McCarthy, A., Lee, K., Itakura, S., and Muir, D. W. (2008). Gaze display when thinking depends on culture and context. J. Cross Cult. Psychol . 39, 716–729. doi: 10.1177/0022022108323807

McDonald, N., Schoenebeck, S., and Forte, A. (2019). Reliability and inter-rater reliability in qualitative research: norms and guidelines for CSCW and HCI practice. Proc. ACM Hum. Comp. Inter . 3, 1–23. doi: 10.1145/3359174

Morales, M., Mundy, P., Delgado, C. E. F., Yale, M., Messinger, D., Neal, R., et al. (2000). Responding to joint attention across the 6-through 24-month age period and early language acquisition. J. Appl. Dev. Psychol . 21, 283–298. doi: 10.1016/S0193-3973(99)00040-4

Morimoto, C. H., and Mimica, M. R. (2005). Eye gaze tracking techniques for interactive applications. Comp. Vision Image Understand . 98, 4–24. doi: 10.1016/j.cviu.2004.07.010

Myszka, T. J. (1975). Situational and intrapersonal determinants of eye contact, direction of gaze aversion, smiling and other non-verbal behaviors during an interview (Doctoral dissertation), University of Windsor, Windsor. Retrieved from: https://scholar.uwindsor.ca/etd/3477

Noble, H., and Smith, J. (2015). Issues of validity and reliability in qualitative research. Evid. Based Nurs . 18, 34–35. doi: 10.1136/eb-2015-102054

Novick, D. G., Hansen, B., and Ward, K. (1996). “Coordinating turn-taking with gaze,” in Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP'96 . (Philadelphia, PA: IEEE) 3, 1888–1891. doi: 10.1109/ICSLP.1996.608001

O'Brien, B. C., Harris, I. B., Beckman, T. J., Reed, D. A., and Cook, D. A. (2014). Standards for reporting qualitative research: a synthesis of recommendations. Acad. Med. 89, 1245–1251. doi: 10.1097/ACM.0000000000000388

Park, I. (2015). Or-prefaced third turn self-repairs in student questions. Lingu. Educ . 31, 101–114. doi: 10.1016/j.linged.2015.06.001

Purcell, J. R., Lohani, M., Musket, C., Hay, A. C., Isaacowitz, D. M., and Gruber, J. (2018). Lack of emotional gaze preferences using eye-tracking in remitted bipolar I disorder. Int. J. Bipolar Disord . 6:15. doi: 10.1186/s40345-018-0123-y

Queirós, A., Faria, D., and Almeida, F. (2017). Strengths and limitations of qualitative and quantitative research methods. Euro. J. Educ. Stud. 3, 369–386. doi: 10.46827/ejes.v0i0.1017

Robinson, J. D., and Bolden, G. B. (2010). Preference organization of sequence-initiating actions: the case of explicit account solicitations. Discour. Studies 12, 501–533. doi: 10.1177/1461445610371051

Rossano, F. (2012). Gaze behavior in face-to-face interaction (Doctoral dissertation), Radboud University Nijmegen, Nijmegen. Retrieved from: http://hdl.handle.net/11858/00-001M-0000-000F-ED23-5

PubMed Abstract | Google Scholar

Rossano, F. (2013). “Gaze in conversation,” in Handbook of Conversation Anaysis , eds J. Sidnell and T. Stivers (Chichester: Wiley-Blackwell), 308–329. doi: 10.1002/9781118325001.ch15

Rossano, F., Brown, P., and Levinson, S. C. (2009). Gaze, questioning and culture. Convers. Anal. 27, 187–249. doi: 10.1017/CBO9780511635670.008

Rubin, Z. (1970). Measurement of romantic love. J. Pers. Soc. Psychol . 16:265. doi: 10.1037/h0029841

Rutter, D. R., Stephenson, G. M., Ayling, K., and White, P. A. (1978). The timing of looks in dyadic conversation. Br. J. Soc. Clin. Psychol . 17, 17–21. doi: 10.1111/j.2044-8260.1978.tb00890.x

Ryan, A. M., Freeman, S. M., Murai, T., Lau, A. R., Palumbo, M. C., Hogrefe, C. E., et al. (2019). Non-invasive eye tracking methods for new world and old world monkeys. Front. Behav. Neurosci . 13, 1–10. doi: 10.3389/fnbeh.2019.00039

Sacks, H., Schegloff, E. A., and Jefferson, G. (1974). “A simplest systematics for the organization of turn taking for conversation,” in Studies in the Organization of Conversational Interaction , ed J. Schenkein (Cambridge, MA: Academic Press), 7–55. doi: 10.2307/412243

Sato, W., Kochiyama, T., Uono, S., and Toichi, M. (2016). Neural mechanisms underlying conscious and unconscious attentional shifts triggered by eye gaze. Neuroimage 124, 118–126. doi: 10.1016/j.neuroimage.2015.08.061

Scheflen, A. E. (1964). The significance of posture in communication systems. Psychiatry 27, 316–331. doi: 10.1080/00332747.1964.11023403

Schegloff, E. (1972). Notes on Convesrsational Practice: Formulating Place in Studies in Social Interaction, D . NewYork: The Free Press.

Schegloff, E. A. (1968). Sequencing in conversational openings 1. Am. Anthropol . 70, 1075–1095. doi: 10.1525/aa.1968.70.6.02a00030

Schegloff, E. A. (1997). Practices and actions: boundary cases of other-initiated repair. Discou. Process . 23, 499–545. doi: 10.1080/01638539709545001

Schegloff, E. A. (2001). “Accounts of conduct in interaction: Interruption, overlap, and turn-taking,” in Handbook of Sociological Theory (Boston, MA: Springer), 287–321. doi: 10.1007/0-387-36274-6_15

Schegloff, E. A., Jefferson, G., and Sacks, H. (1977). The preference for self-correction in the organization of repair in conversation. Language (Baltim) 53, 361–382. doi: 10.1353/lan.1977.0041

Schegloff, E. A., and Sacks, H. (1973). Opening up closings. Semiotica 8, 289–327. doi: 10.1515/semi.1973.8.4.289

Selting, M. (2000). The construction of units in conversational talk. Lang. Soc . 29, 477–517. doi: 10.1017/S0047404500004012

Selting, M., Auer, P., Barden, B., Bergmann, J., Couper-Kuhlen, E., Günthner, S., et al. (1998). Gesprachsanalytisches transkriptionssystem (GAT). Lingu. Berich , 173, 91–122.

Selting, M., Auer, P., Barth-Weingarten, D., Bergmann, J. R., Bergmann, P., Birkner, K., et al. (2009). Gesprächsanalytisches transkriptionssystem 2 (GAT 2) . Gesprächsforschung: Online-Zeitschrift zur verbalen Interaktion.

Sidnell, J., (ed.). (2009). Conversation analysis: Comparative perspectives (Vol. 27). Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511635670

Stivers, T., and Enfield, N. J. (2010). A coding scheme for question–response sequences in conversation. J. Pragmat . 42, 2620–2626. doi: 10.1016/j.pragma.2010.04.002

Stivers, T., and Rossano, F. (2010). Mobilizing response. Res. Lang. Soc. Interact. 43, 3–31. doi: 10.1080/08351810903471258

Streeck, J. (2014). “Mutual gaze and recognition. Revisiting Kendon's Gaze direction in two-person conversation,” in From Gesture in Conversation to Visible Action as Utterance , eds M. Seyfeddinipur, and M. Gullberg (Amsterdam: Benjamins), 35–55. doi: 10.1075/z.188.03str

Strongman, K. T., and Champness, B. G. (1968). Dominance hierarchies and conflict in eye contact. Acta Psychol. (Amst) . 28, 376–386. doi: 10.1016/0001-6918(68)90026-7

Van Tam, N. (2016). Some considerations on conversation analysis. Int. J. Human. Soc. Sci. Stud. 3, 185–190.

Walker, G. (2016). Phonetic variation and interactional contingencies in simultaneous responses. Discour. Process . 53, 298–324. doi: 10.1080/0163853X.2015.1056073

Weiss, C. (2018). When gaze-selected next speakers do not take the turn. J. Pragmat . 133, 28–44. doi: 10.1016/j.pragma.2018.05.016

Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., and Sloetjes, H. (2006). “ELAN: a professional framework for multimodality research,” In 5th International Conference on Language Resources and Evaluation (LREC 2006) (Genoa), 1556–1559.

Wright, R. W., Brand, R. A., Dunn, W., and Spindler, K. P. (2007). How to write a systematic review. Clin. Orthop. Related Res . 455, 23–29. doi: 10.1097/BLO.0b013e31802c9098

Yngve, V. H. (1970). “On getting a word in edgewise,” in Chicago Linguistics Society, 6th Meeting (Chicago, IL), 567–578.

Zima, E., Wei,ß, C., and Brône, G. (2019). Gaze and overlap resolution in triadic interactions. J. Pragmat . 140, 49–69. doi: 10.1016/j.pragma.2018.11.019

Keywords: eye gaze, turn taking, dyads, triads, communication, conversation

Citation: Degutyte Z and Astell A (2021) The Role of Eye Gaze in Regulating Turn Taking in Conversations: A Systematized Review of Methods and Findings. Front. Psychol. 12:616471. doi: 10.3389/fpsyg.2021.616471

Received: 12 October 2020; Accepted: 01 March 2021; Published: 07 April 2021.

Reviewed by:

Copyright © 2021 Degutyte and Astell. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Ziedune Degutyte, z.degutyte@samsung.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Eyegaze Communication System

Profile image of Pranoti Gaddamwar

International Journal of Computer and Communication Technology

The Eyegaze System is a communication and control system for people with complex physical disabilities. You run the system with your eyes. By looking at control keys disabled on a screen, a person can synthesize speech, control his environment (lights, appliances, etc.), type, operate a telephone, run computer software, operate a computer mouse, and access the Internet and e-mail. Eyegaze Systems are being used to write books, attend school and enhance the quality of life of people with disabilities all over the world.

Related Papers

International Journal of Engineering Research and Technology (IJERT)

IJERT Journal

https://www.ijert.org/eyegaze-communication-system https://www.ijert.org/research/eyegaze-communication-system-IJERTCONV7IS11052.pdf The objective of the Eye gaze project is to use Eye gaze of the human by the means of interaction with the computer. As such, we have to develop a commercial computer system such that users will be able to operate computer based system by giving commands making use of his eye only. For instance to perform particular function such as to switch ON/OFF lights, the user activate control key on the screen in front of the function only by looking towards that key. The advantage of this system that there is no need of any physical connection between user and the system.

eye gaze communication system research paper

IEICE Transactions on Information and Systems

Shoji Tominaga

Gwo-dong Chen

The object of this paper is to present a set of techniques integrated into a low-lost eye tracking system. Eye tracking systems have many potential applications such as learning emotion monitoring systems, drivers' fatigue detection systems, etc. In this paper, we report how we use an eye tracking system to implement an" eye mouse" to provide computer access for people with severe disabilities. The proposed eye mouse allows people with severe disabilities to use their eye movements to manipulate computers.

2008 International Conference on Audio, Language and Image Processing

fakkirgouda patil

Marco Porta

arXiv: Human-Computer Interaction

Bibianna Bałaj

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Anna Cavender

A. Fernando Ribeiro

— Eye tracking as an interface to operate a computer is under research for a while and new systems are still being developed nowadays that provide some encouragement to those bound to illnesses that incapacitates them to use any other form of interaction with a computer. Although using computer vision processing and a camera, these systems are usually based on head mount technology being considered a contact type system. This paper describes the implementation of a human-computer interface based on a fully non-contact eye tracking vision system in order to allow people with tetraplegia to interface with a computer. As an assistive technology, a graphical user interface with special features was developed including a virtual keyboard to allow user communication, fast access to pre-stored phrases and multimedia and even internet browsing. This system was developed with the focus on low cost, user friendly functionality and user independency and autonomy.

RELATED PAPERS

SM Dermatology Journal

Nataliya Lutay

Gilles Lhuilier

Banpreet Walia

IJMRAP Editor

Zafar Nawaz Jaspal

Arthur De Haeck

Jörg Siekmann

Temas em Psicologia

José María Pérez Cruz

Csaba Felho

Global Heart

haddy nalubwama

Siti Fadillah Khairani

Siti fadillah Khairani

Revista Brasileira De Agroecologia

walter sousa

JI-KES (Jurnal Ilmu Kesehatan)

Khoirul Ngibad

Ahmed Mohamed

Vojnotehnicki glasnik

Rajko Terzic

jhhjfg hhjhjg

Frontiers in psychiatry

Birgit Völlm

Food Chemistry

Cengiz Gokbulut

Thomas M.J. Möllers

Islam and Christian–Muslim Relations

Ahmad Faisal Abdul Hamid

Between the Species

Gary Francione

hyutrTT hytutr

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Book cover

International Conference on Human Interface and the Management of Information

HIMI 2017: Human Interface and the Management of Information: Information, Knowledge and Interaction Design pp 607–616 Cite as

A Speech-Driven Embodied Communication System Based on an Eye Gaze Model in Interaction-Activated Communication

  • Yoshihiro Sejima 14 ,
  • Koki Ono 15 &
  • Tomio Watanabe 14  
  • Conference paper
  • First Online: 18 May 2017

1733 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10273))

Line-of-sight such as gaze and eye-contract plays an important role to enhance the embodied interaction and communication through avatars. In addition, many gaze models and communication systems with the line-of-sight using avatars have been proposed and developed. However, the gaze behaviors by generating the above-mentioned models are not considered to enhance the embodied interaction such as activated communication, because the models stochastically generate the eyeball movements based on the human gaze behavior. Therefore, we analyzed the interaction between the human gaze behavior and the activated communication by using line-of-sight measurement devices. Then, we proposed an eye gaze model based on the above-mentioned analysis. In this study, we develop an advanced avatar-mediated communication system in which the proposed eye gaze model is applied to speech-driven embodied entrainment characters called “InterActor.” This system generates the avatar’s eyeball movements such as gaze and looking away based on the activated communication, and provides a communication environment wherein the embodied interaction is promoted. The effectiveness of the system is demonstrated by means of sensory evaluations of 24 pairs of subjects involved in avatar-mediated communication.

You have full access to this open access chapter,  Download conference paper PDF

1 Introduction

With the advancements in the field of information technology, it is now becoming possible for humans to use CG characters called avatars to communicate in a 3D virtual space over a network. Furthermore, many researches that support remote communication using CG characters such as avatars and agent are performed [ 1 ]. However, current systems do not simulate embodied sharing using synchrony of embodied rhythms, such as the nodding and body movements in human face-to-face communication, because the CG characters express nonverbal behavior based on the key commands. In human face-to-face communication, not only verbal messages but also nonverbal behavior such as nodding, body movement, line-of-sight and facial expression are rhythmically related and mutually synchronized between talkers [ 2 ]. This synchrony of embodied rhythms in communication is called entrainment, and it enhances the sharing of embodiment and empathy unconsciously in human interaction and accelerates the activated communication in which nonverbal behaviors such as body movements and speech activity increase, and the embodied interaction is activated [ 3 ].

In our previous work, we analyzed the entrainment between a speaker’s speech and a listener’s nodding motion in face-to-face communication, and developed iRT (InterRobot Technology), which generates a variety of communicative actions and movements such as nodding and blinking and movements of the head, arms, and waist that are coherently related to voice input [ 4 ]. In addition, we developed an interactive CG character called “InterActor” which has functions of both speaker and listener, and demonstrated that InterActor can effectively support human interaction and communication [ 4 ]. Moreover, we developed an estimation model of interaction-activated communication based on the heat conduction equation and demonstrated the effectiveness of the model by the evaluation experiment [ 5 ].

On the other hand, body movements as well as line-of-sight such as eye contact and gaze duration play an important role in smooth human face-to-face communication [ 6 ]. Moreover, it is reported that smooth communication via avatars is realized by expressing the avatar’s gaze. For example, Ishii et al. developed a communication system that controls an avatar’s gaze based on an estimated line-of-sight model and demonstrated that utterance is facilitated between talkers using this model in an avatar-mediated communication [ 7 ]. Also, we analyzed human eyeball movement through avatars by using an embodied virtual communication system with a line-of-sight measurement device, and proposed an eyeball movement model, consisting of an eyeball delay movement model and a gaze withdrawal model [ 8 ]. In addition, we developed an advanced avatar-mediated communication system by applying our proposed eyeball movement model to InterActors, and demonstrated that the developed system is effective for supporting the embodied interaction and communication. These systems generate the avatar’s eyeball movement by a statistical model based on face-to-face communication characteristics. However, from the viewpoint of promoting the line-of-sight interaction, it is difficult for these systems to enhance the line-of-sight interaction, because the dynamic characteristics of human line-of-sight in the activated communication have not yet been designed. Therefore, in our previous research, we analyzed the interaction between activated communication and human gaze behavior by using a line-of-sight measurement device [ 8 ]. On the basis of this analysis, we proposed an eye gaze model, consisting of an eyeball delay movement model and a look away model.

In this paper, we develop an advanced avatar-mediated communication system by applying the proposed eye gaze model to InterActors. This system generates the avatar’s eyeball movements such as gaze and looking away based on the proposed model by using only speech input, and provides a communication environment wherein the embodied interaction is promoted. The effectiveness of the proposed and communication system is demonstrated by means of sensory evaluations in an avatar-mediated communication system.

2 A Speech-Driven Embodied Communication System Based on an Eye Gaze Model

2.1 interactor.

In order to support human interaction and communication, we developed a speech-driven embodied entrainment character called InterActor, which has the functions of both speaker and listener [ 4 ]. The configuration of InterActor is shown in Fig.  1 . InterActor has a virtual skeleton structure such as head, eyes, mouth, neck, shoulders, elbows, hands (Fig.  1 (a)). The texture puts on the 3D surface model including the virtual skeleton structure (Fig.  1 (b)). In addition, the various facial expressions are realized by applying the smile model in which the previous research was developed (Fig.  1 (c)) [ 9 , 10 ].

InterActor: speech-driven embodied entrainment character.

The listener’s interaction model includes a nodding reaction model which estimates the nodding timing from a speech ON-OFF pattern and a body reaction model linked to the nodding reaction model [ 4 ]. The timing of nodding is predicted using a hierarchy model consisting of two stages; macro and micro (Fig.  2 ). The macro stage estimates whether a nodding response exists or not in a duration unit which consists of a talkspurt episode T ( i ) and the following silence episode S ( i ) with a hangover value of 4/30 s. The estimator M u ( i ) is a moving-average (MA) model, expressed as the weighted sum of unit speech activity R ( i ) in Eqs. ( 1 ) and ( 2 ). When M u ( i ) exceeds a threshold value, nodding M ( i ) is also a MA model, estimated as the weighted sum of the binary speech signal V ( i ) in Eq. ( 3 ).

a ( j ): linear prediction coefficient

T ( i ): talkspurt duration in the i th duration unit

S ( i ): silence duration in the i th duration unit

u ( i ): noise

i : number of frame

b ( j ): linear prediction coefficient

V ( i ): voice

w ( i ): noise

Interaction model.

The body movements are related to the speech input in that the neck and one of the wrists, elbows, arms, or waist is operated when the body threshold is exceeded. The threshold is set lower than that of the nodding prediction of the MA model, which is expressed as the weighted sum of the binary speech signal to nodding. In other words, when InterActor functions as a listener for generating body movements, the relationship between nodding and other movements is dependent on the threshold values of the nodding estimation.

2.2 Eye Gaze Model

We proposed an eye gaze model that generates a gaze movement and looking away movement for enhancing embodied communication based on the characteristics of the analysis of human eyeball movement. The proposed model consists of the previous eyeball delay movement model [ 8 ] and look away model. The outline of the proposed model is indicated as follows:

(1) Eyeball Delay Movement Model

The eyeball delay movement model consists of a delay of 0.13 s with respect to the avatar’s head movement. First, the angle of the avatar’s gaze direction for the viewpoint in virtual space is calculated using Eq.  4 (Fig.  3 (a)). Then, the avatar’s gaze is generated by adding the angle of the avatar’s head movement to the angle of the avatar’s gaze direction in the fourth previous frame at a frame rate of 30 fps (Eq.  5 ). Figure  3 (b) shows an example of the eyeball delay movement model in an avatar. If the avatar’s head moves, the eyeball moves with a delay of 0.13 s with respect to the head movement in the opposite direction.

θ AG : Rotation angle of gaze direction

A Ex , A Ey : eyeball postion of InterActor

P x , P y : position of view point in virtual space

θ G ( i ): Rotation angle of eyeball movement

θ AH ( i ): Rotation angle of InterActor’s head movement

Eyeball delay movement model.

(2) Look Away Model

The previous analysis of the human eyeball indicates that direct gaze is limited to about 80% of total conversation time [ 8 ]. Therefore, the look away model in this study generates eyeball movement for other gazes such as gaze withdrawal and blinking based on the previous analysis. The avatar’s eyeball of looking away is moved at the horizontal direction greatly (Fig.  4 ), and the effectiveness of this movement was confirmed in a preliminary experiment. When a value which is estimated the degree of interaction-activated communication falls below a threshold value, the looking away movement is generated by the proposed model (Fig.  5 ). The avatar’s gaze would be modulated such that staring is prevented and impressions of the conversation such as unification and vividness are enhanced.

Looking away movement.

Look away model.

2.3 Developed System

We developed an advanced communication system in which the proposed model was used with InterActors (Fig.  6 ). The virtual space was generated by Microsoft DirectX 9.0 SDK (June 2010) and a Windows 7 workstation (CPU: Corei7 2.93 GHz, Memory: 8 GB, Graphics: NVIDIA Geforce GTS250). The voice was sampled using 16 bits at 11 kHz via a headset (Logicool H330). InterActors were represented at a frame rate of 30 fps.

System setup.

When Talker1 speaks to Talker2, InterActor2 responds to Talker1’s utterance with appropriate timing through body movements, including nodding, blinking, and actions, in a manner similar to the body motions of a listener. A nodding movement is defined as the falling-rising movement in the front-back direction at a speed of 0.15 rad/frame. In addition, InterActor2 generates an eyeball movement based on the proposed model. Here, a looking away movement is defined as the left-right motion of eyeballs at a speed of 0.15 rad/frame based on the preliminary experiment. Also, InterActor1 generates communicative actions and movements and avatar’s eyeball movements as a speaker by using the MA model and eye gaze model. In this manner, two remote talkers can enjoy a conversation via InterActors within a communication environment in which the sense of unity is shared by embodied entrainment.

3 Communication Experiment

In order to evaluate the developed system, a communication experiment was carried out using the developed system.

3.1 Experimental Method

The experiment was performed on talkers engaged in a free conversation. In this experiment, the following three modes were compared: mode (A) with neither eyeball movement nor facial expression, mode (B) with smile model only, and mode (C) with combined smile model and eye gaze model. We recorded the communication experiment scene using two video cameras and screens as shown in Fig.  7 . The subjects were 12 pairs of talkers (12 males and 12 females).

figure 7

Example of a communication scene using the system.

The experimental procedure is described as follows. First, the subjects used the system for around 3 min. Next, they were instructed to perform a paired comparison of modes in which, based on their preferences, they selected the better mode. Finally, they were urged to talk in a free conversation for 3 min in each mode. The questionnaire used a seven-point bipolar rating scale from −3 (not at all) to 3 (extremely), where a score of 0 denotes “moderately.” The conversational topics were not specified in both experiments. Each pair of talkers was presented with the two modes in a random order.

The results of the paired comparison are summarized in Table  1 . In this table, the number of winner is shown. For example, the number of mode (A)’s winner is six for mode (B), and the number of total winner is nine. Figure  8 shows the calculated results of the evaluation provided in Table  1 based on the Bradley-Terry model given in Eqs. ( 6 ) and ( 7 ) [ 11 ].

Comparison of the preference \( \pi \) based on the Bradley-Terry model.

π i : Intensity of i

p ij : probability of judgment that i is better than j

The consistency of mode matching was confirmed by performing a goodness of fit test \( (x^{2}(1,0.05) = 3.84 > x_{0}^{2} = 0.28) \) and a likelihood ratio test \( (x^{2}(1,0.05) = 3.84 > x_{0}^{2} = 0.27) \) . The proposed mode (C), with both smile model and eye gaze model, was evaluated as the best; followed by mode (B), smile model only; and mode (A), no movement.

The questionnaire results are shown in Fig.  9 . From the results of the Friedman signed-rank test and the Wilcoxon signed rank test, all categories showed a significance level of 1% among modes (A), (B), and (C). In addition, “Enjoyment,” “Interaction-activated communication,” “Vividness,” and “Natural line-of-sight” had a significance level of 5% between modes (B) and (C).

Seven-points bipolar rating.

In both experiments, mode (C) of the proposed eye gaze model was evaluated as the best for avatar-mediated communication. These results indicate the effectiveness of the proposed eye gaze model. These results demonstrate that the combined model is effective.

4 Conclusion

In this paper, we developed an advanced avatar-mediated communication system in which our proposed eye gaze model is used by speech-driven embodied entrainment characters called InterActors. The proposed model consists of an eyeball delay movement model and a look away model. The communication system generates eyeball movement based on this model by generating the entrained head and body motions of InterActors using only speech input. Sensory evaluations in an avatar-mediated communication system showed the effectiveness of the proposed eye gaze model and communication system.

Ishii, K., Taniguchi, Y., Osawa, H., Nakadai, K., Imai, M.: Merging viewpoints of user and avatar in telecommunication using image and sound projector. Trans. Inf. Process. Soc. Jpn. 54 (4), 1413–1421 (2013)

Google Scholar  

Condon, W.S., Sander, L.W.: Neonate movement is synchronized with adult speech. Science 183 , 99–101 (1974)

Article   Google Scholar  

Watanabe, T.: Human-entrained embodied interaction and communication technology. In: Fukuda, S. (ed.) Emotional Engineering, pp. 161–177. Springer, Heidelberg (2011)

Chapter   Google Scholar  

Watanabe, T., Okubo, M., Nakashige, M., Danbara, R.: InterActor: speech-driven embodied interactive actor. Int. J. Hum.-Comput. Interact. 17 (1), 43–60 (2004)

Sejima, Y., Watanabe, T., Jindai, M.: Development of an interaction-activated communication model based on a heat conduction equation in voice communication. In: Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2014), pp. 832–837 (2014)

Argyle, M., Dean, J.: Eye contact, distance and affiliation. Sociometry 41 (3), 289–304 (1965)

Ishii, R., Miyajima, T., Fujita, K.: Avatar’s gaze control to facilitate conversation in virtual-space multi-user voice chat system. Trans. Hum. Interface Soc. 10 (3), 87–94 (2007)

Sejima, Y., Watanabe, T., Jindai, M.: An embodied communication system using speech-driven embodied entrainment characters with an eyeball movement model. Trans. Jpn. Soc. Mech. Eng. Ser. C 76 (762), 340–350 (2010)

Sejima, Y., Ono, K., Yamamoto, M., Ishii, Y., Watanabe, T.: Development of an embodied communication system with line-of-sight model for speech-driven embodied entrainment character. In: Proceedings of the 25th JSME Design and Systems Conference, no. 1110, pp. 1–9 (2015)

Yamamoto, M., Takabayashi, N., Ono, K., Watanabe, T., Ishii, Y.: Development of a nursing communication education support system using nurse-patient embodied avatars with a smile and eyeball movement model. In: Proceedings of the 2014 IEEE/SICE International Symposium on System Integration (SII 2014), pp. 175–180 (2014)

Luce, R.D.: Individual Choice Behavior: A Theoretical Analysis. Wiley, New York (1959)

MATH   Google Scholar  

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Numbers JP16K01560, JP26280077.

Author information

Authors and affiliations.

Faculty of Computer Science and System Engineering, Okayama Prefectural University, Kuboki 111, Soja-shi, Okayama, Japan

Yoshihiro Sejima & Tomio Watanabe

Benesse InfoShell Co., Ltd., Takayanagihigashimachi 10-1, Kita-ku, Okayama, Japan

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Yoshihiro Sejima .

Editor information

Editors and affiliations.

Tokyo University of Science, Tokyo, Japan

Sakae Yamamoto

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper.

Sejima, Y., Ono, K., Watanabe, T. (2017). A Speech-Driven Embodied Communication System Based on an Eye Gaze Model in Interaction-Activated Communication. In: Yamamoto, S. (eds) Human Interface and the Management of Information: Information, Knowledge and Interaction Design. HIMI 2017. Lecture Notes in Computer Science(), vol 10273. Springer, Cham. https://doi.org/10.1007/978-3-319-58521-5_48

Download citation

DOI : https://doi.org/10.1007/978-3-319-58521-5_48

Published : 18 May 2017

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-58520-8

Online ISBN : 978-3-319-58521-5

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Eye-gaze systems - An analysis of error sources and potential accuracy in consumer electronics use cases

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Sensors (Basel)

Logo of sensors

Gaze and Eye Tracking: Techniques and Applications in ADAS

Tracking drivers’ eyes and gazes is a topic of great interest in the research of advanced driving assistance systems (ADAS). It is especially a matter of serious discussion among the road safety researchers’ community, as visual distraction is considered among the major causes of road accidents. In this paper, techniques for eye and gaze tracking are first comprehensively reviewed while discussing their major categories. The advantages and limitations of each category are explained with respect to their requirements and practical uses. In another section of the paper, the applications of eyes and gaze tracking systems in ADAS are discussed. The process of acquisition of driver’s eyes and gaze data and the algorithms used to process this data are explained. It is explained how the data related to a driver’s eyes and gaze can be used in ADAS to reduce the losses associated with road accidents occurring due to visual distraction of the driver. A discussion on the required features of current and future eye and gaze trackers is also presented.

1. Introduction

1.1. background and motivation.

The human eyes, a beautiful and interactive organ in the human body, have unique physical, photometric, and motion characteristics. These characteristics provide vital information required for eye detection and tracking. In our daily lives, a person’s emotional state, mental occupancy, and needs can be judged by the person’s eyes movements. Through our eyes, we identify the properties of the visual world and collect the information essential to our lives. Moreover, in the field of image and video processing, eyes play a vital role in the process of face detection and recognition [ 1 , 2 , 3 , 4 ]. The history of eye tracking dates back to second half of 18th century when researchers observed eye movements to analyze reading patterns. The early trackers used a sort of contact lens with a hole for the pupil [ 5 ]. In this arrangement, the movements of eye were tracked using an aluminum pointer connected to the lens. The authors of [ 6 , 7 ] developed first non-intrusive eye-trackers using light beams that were reflected on the eye and then recorded on a film. The authors also provided a systematic analysis of reading and picture viewing. A significant contribution in eye tracking research was made by the author of [ 8 ] in the 1950s and 1960s. The author showed that the gaze trajectories depend on the task that the observer has to execute. If the observers are asked particular questions about an image, their eyes concentrate on question-relevant areas of the image. The author also devised a suction cup that could stay on the human eye by suction to analyze visual perceptions in the absence of eye movements. In 1970s and afterwards, the research of eye tracking expanded rapidly [ 9 ]. In 1980s, a hypothesis known as the eye-mind hypothesis was formulated and critically analyzed by other researchers [ 10 , 11 , 12 ]. The hypothesis proposed that there is no considerable lag between what is fixated and what is processed. Further, several aspects related to eye tracking in the field of human-computer interaction and eye tracking applications to assist disabled people were also developed in the same decade [ 13 ]. During the last two to three decades, a revolutionary development was observed in eye tracking due to introduction of artificial intelligence techniques and portable electronics and head-mounted eye trackers.

Eye tracking and gaze estimation are essentially two areas of research. The process of eye tracking involves three main steps; viz., to discover the presence of eyes, a precise interpretation of eye positions, and frame to frame tracking of detected eyes. The position of the eye is generally measured with the help of the pupil or iris center [ 14 ]. Gaze estimation is a process to estimate and track the 3D line of sight of a person, or simply, where a person is looking. The device or apparatus used to track gaze by analyzing eye movements is called a gaze tracker. A gaze tracker performs two main tasks simultaneously: localization of the eye position in the video or images, and tracking its motion to determine the gaze direction [ 15 , 16 ]. A generic representation of such techniques is shown in Figure 1 . In addition to its application in advanced driving assistance systems (ADAS), gaze tracking is also critical in several other applications, such gaze-dependent graphical displays, gaze-based user interface, investigations of human cognitive states, and human attention studies [ 17 , 18 , 19 ].

An external file that holds a picture, illustration, etc.
Object name is sensors-19-05540-g001.jpg

The process of tracking eye position and gaze coordinates.

Tracking of driver’s eyes and gaze is an interesting feature of advanced driving assistance systems (ADAS) that can help reduce the losses involved in road accidents. According to World Health Organization’s reports [ 20 , 21 , 22 ], every year approximately 1–1.25 million people die and 20–50 million people receive injuries due to road accidents across the world. Moreover, if the recent trend of road accidents persists by 2030, road accidents could be the 5th main cause of death. In terms of cost, the damages involved in road accidents are more than five hundred billion USD. This amount is approximately equal to 2% of the gross national product (GNP) of advanced countries, 1.5% of the GNP of medium-income economies, and 1% of GNP of low-income countries. According to the recent studies (e.g., [ 23 ]), it is hoped that the amount of road accidents (related to visual distraction) will be reduced by 10–20% due to facial monitoring feature of ADAS.

1.2. Contribution and Organization

The intention of this paper is to benefit researchers by offering a comprehensive framework for a basic understanding of eye and gaze tracking and their applications in ADAS. To the best of authors’ knowledge, this is the first study that reviews the visual data (i.e., eyes and gaze data) techniques in the context of ADAS applications, though studies do exist regarding individual topics covered in this paper.

This paper is organized as follows: Section 2 and Section 3 explain the models and techniques developed for eye and gaze tracking, respectively. The major categories of these models and techniques, with emphasis on the literature in which these techniques were initially proposed, and their respective benefits and limitations, are also discussed. Section 4 explains the driving process and challenges associated with a driver’s visual distraction. In this section, it is explained that how visual data of a driver are collected, processed, and used in ADAS applications. Further, the features of modern vehicles based on utilization of visual data of drivers and other vehicle parameters are summarized in this section. At the end of each section of the paper, necessary information is presented in a comprehensive tabular form. Section 5 concludes the paper with pointers on the future directions in this research field. The authors do admit that the topic presented is too wide and deep to be reviewed by a single paper. We encourage the interested readers to refer to other references, provided at the end of this paper, for further study of specific areas or the topics not covered in this work. For example, operational definitions of driving performance measures and statistics are well-documented in [ 24 ].

2. Eye Tracking

2.1. introduction.

The first step in eye tracking is to detect the eyes. The detection of eyes in image or video data is based on eye models. An exemplar eye model should be sufficiently meaningful to accommodate the variability in eyes’ dynamics and appearance while adequately constrained to be computationally efficient. Eye detection and tracking is an arduous job due to exceptional issues, such as degrees of eye openness; variability in size, head pose, and reflectivity; and occlusion of the eye by eyelids [ 3 , 25 , 26 ]. For instance, a small variation in viewing angle or head position causes significant changes in the eye appearance or gaze direction, as shown in Figure 2 . The eye’s appearance is also influenced by ethnicity of the subject, light conditions, texture, iris position within eye socket, and the eye status (open or closed). Eye detection methods are broadly categorized based on eyes’ shape, features, and appearance, as explained below.

An external file that holds a picture, illustration, etc.
Object name is sensors-19-05540-g002.jpg

The appearances of eyes and eye parts change with head and eye movements. ( a ) Variability in eye appearance when eye position is fixed but head position varies. ( b ) Variability in gaze direction when head position is fixed but eyeball rotates.

2.2. Shape-Based Techniques

An open eye can be efficiently expressed by its exterior (e.g., eyelids) and interior (e.g., iris and pupil) parts. The shape-based techniques are based on a geometric eye model (i.e., an elliptical or a complex eye structure) augmented with a similarity index. The model defines the allowable template deformations and contains parameters for nonrigid template deformations and rigid transformations. The main feature of these techniques is their capability of handling the changes in shape and scale.

2.2.1. Elliptical Eye Models

For simpler applications of eye detection and tracking, the elliptical appearance of the eye can serve the job. Though simple elliptical eye shape models proficiently model features such as the pupil and iris under various viewing angles, these models are lacking in terms of capturing the variations and inter-variations of certain eye features. A major category of the techniques which consider the simple elliptical eye model are known as model fitting techniques which fit the designated features to the elliptical model [ 27 , 28 ]. Typically, in the techniques which utilize the elliptical eye model, pupil boundaries are extracted with the help of edge detection techniques, while transformation algorithms such as the Hough transform are utilized to extract the features of iris and pupil [ 29 ]. The authors of [ 30 ] and [ 31 ] estimated the center of pupil ellipse using thresholds of the image intensities. In their techniques, a constraint of shape circularity is employed to improve the efficiency; however, the model works only for near-frontal faces due to this constraint. Another category of techniques that exploit the simple elliptical eye model calls its members voting-based techniques [ 31 ]. The parameter selected in voting techniques support a given hypothesis through an accumulation process. The authors of [ 32 ] proposed a voting scheme that utilized temporal and spatial data to detect the eyes. They used a large temporal support and a set of heuristic rules to reject false pupil candidates. A similar voting scheme, which used edge orientation directly in the voting process, was also suggested in [ 33 ]. This technique was based on the intensity features of the images, and it relied on anthropomorphic averages and a prior face model to filter out the false positives. A limitation of such techniques is that they basically rely on maxima in feature space. When the number of eye region features decreases, the techniques may mistake other regions, such as eyebrows, for the eyes. So, these techniques are typically applicable when the search region is confined. A low-cost eye tracking system is proposed in [ 34 ], where the Starburst algorithm is used for iris detection. This algorithm finds the highest gray-level differences along rays while recursively sparkling new rays at the already found maxima. The Starburst algorithm is basically an active shape model which uses several features along each normal.

2.2.2. Complex Shape Models

Complex shape-based models are based on in-depth modeling of the eye shape [ 35 , 36 , 37 , 38 ]. A well-known example of complex shape models is the deformable template model [ 35 ], which consists of a circle for the iris representation and two parabolas for the eyelids. To fit the model to an image, energy functions for internal forces, edges, valleys, and image peaks, are incorporated in an update rule. However, the right selection of the template’s initial position is crucial for accurate results in this approach as the system cannot detect the eyes if the template is initialized above the eyebrow. Other limitations of this model are the complex template description and complexity with eye occlusions due to non-frontal head pose or eyelid closure. The authors of [ 36 ] extended this model to extract the eye features by considering eye corners as the initialization points. They used a nonparametric technique (known as snake model) to determine the head’s outline, and found the approximated eye positions by anthropomorphic averages. The information of the detected eye corners is utilized to lower the iterations number in the optimization of the deformable template. Similarly, the authors of [ 39 , 40 ] proposed the ways to speed up the technique proposed in [ 35 ]. Some researchers combined the features of complex eye models with elliptical models to improve accuracy and speed of the localization process (e.g., [ 41 ]).

Certain deformable models (e.g., snake model) can accommodate for significant shape variations, while the others cannot handle the large variability of eye shapes. The techniques based on deformable eye template are typically considered more logical, generic, and accurate. However, they have certain limitations, such as the requirement for high contrast images, being computationally demanding, and requiring initialization close to the eye. Moreover, for larger head movements, they subsequently rely on other techniques to provide good results.

2.3. Feature-Based Techniques

Feature-based techniques are based on the identification and utilization of a set of unique features of the human eyes. These techniques identify such local features of the eye and the face which have reduced sensitivity to variations in viewing angles and illumination. The commonly used features for eye localization are corneal reflections, limbus, and dark and bright pupil images. Typically, these techniques first identify and detect the local features; then, they apply a filter to highlight desired features while suppressing the others or utilize a prior eye shape model to construct a local contour; and, finally, they apply the classification algorithms to produce the output. Generally, the feature-based techniques are reported to provide good results in indoors applications; however, their outdoor performance is comparatively limited. These techniques are further subcategorized as follows.

2.3.1. Local Features

Eyes’ local features are detected and utilized in combination with a prior shape model to detect and track the eyes [ 42 , 43 , 44 , 45 , 46 ]. For instance, the approach proposed in [ 42 ] first located a specific edge and then employed steerable Gabor filters to trail the edges of the eye corners or the iris. Next, based on the selected features and the eye model, a search policy was adopted to detect the shape, position, and corners of the eye.

The authors of [ 44 ] suggested a part-based model, in which an eye part (e.g., eyelid) is considered as a microstructure. They extracted face features using a multilayer perception method by locating eyes on face images. The authors of [ 45 ] extended the work of [ 42 ] and made improvements by utilizing multiple specialized neural networks (NN) trained to detect scaled or rotated eye images, and they worked effectively under various illumination conditions. The authors of [ 46 , 47 ] detected and utilized the information of area between the two eyes instead of eyes themselves. The area between the eyes is comparably bright on lower and upper sides (nose bridge and forehead, respectively) and has dark regions on its right and left sides. This area is supposed to be more stable and detectable than the eyes themselves. Moreover, this area can be viewed from a wide range of angles, and has a common pattern for most people. The authors of [ 46 , 47 ] located the candidate points by employing a circle-frequency filter. Subsequently, by analyzing the pattern of intensity distribution around the point, they eliminated the spurious points. Enhancing the robustness of this method, a fixed “between the eyes” template was developed to identify the actual candidates and to avoid the confusion between the eye regions and other parts [ 48 , 49 ].

2.3.2. Filter Response

Use of specific filter was also proposed in several techniques to enhance a desired set of features while diminishing the impact of irrelevant features. For instance, authors of [ 50 , 51 ] used linear and nonlinear filters for eye detection and face modeling. They used Gabor wavelets for detection of edges of the eye’s sclera. The eye corners, detected through nonlinear filter, are utilized to determine the eye regions after elimination of the spurious eye corner candidates. The edges of the iris are located through a voting method. Experimental results demonstrate that the nonlinear filtering techniques are superior to the traditional, edge-based, linear filtering techniques in terms of detection rates. However, the nonlinear techniques require high-quality images.

2.3.3. Detection of Iris and Pupil

The pupil and iris being darker than their surroundings are commonly considered reliable features for eye detection. The authors of [ 52 ] used a skin-color model and introduced an algorithm to locate the pupils by searching for two dark areas that fulfill specific anthropometric requirements. Their technique, however, cannot perform well in different light conditions due to limitation of the skin-color model. Generally, use of IR light instead of visible light seems more appropriate for dark region detection. The techniques based on iris and pupil detection require the images taken from close to the eyes or high-resolution images.

The majority of the feature-based techniques cannot be used to model closed eyes. In an effort to overcome this limitation, a method [ 53 ] was proposed to track the eyes and to retrieve the eye parameters with the help of a dual-state (i.e., open or closed) eye model. The eyelids and eyes’ inner corners are detected through the algorithm proposed in [ 54 ]. This technique, however, requires a manual initialization of the eye model and high contrast images.

2.4. Appearance-Based Techniques

The appearance-based techniques detect and track the eyes by using photometric appearance of the eyes, which is characterized by the filter response or color distribution of the eyes with respect to their surroundings. These techniques can be applied either in a spatial or a transformed domain which diminishes the effect of light variations.

Appearance-based techniques are either image template-based or holistic in approach. In the former approach, both the intensity and spatial information of each pixel is maintained, while in the latter technique, intensity distribution is considered and the spatial information is disregarded. Image template-based techniques have limitations associated with scale and rotational modifications, and are negatively influenced by eye movements and head pose variations for the same subject. Holistic approaches (e.g., [ 55 , 56 ]) make use of statistical techniques to derive an efficient representation while analyzing the intensity distribution of the entire object’s appearance. The representation of the object, defined in a latent space, is utilized to deal with the disparities in the object’s appearance. During the test stages of the technique, the similarity analysis between the stored patterns and the test image is performed in the latent space. These techniques usually need a large amount of training data (e.g., the eyes of different subjects under different illumination conditions and facial orientations). However, the underlying developed models, constructed through regression, are principally independent of the object classes.

2.5. Hybrid Models and Other Techniques

Some techniques are based on symmetry operators [ 57 , 58 , 59 ] while some approaches exploit the data of eye blinks and motions [ 48 , 53 , 60 , 61 , 62 ]. Hybrid models combine the benefits of various eye models in a single arrangement while overcoming their deficiencies. These models, for instance, combine shape and intensity features [ 63 , 64 , 65 ], and shape and color features [ 52 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 ].

2.6. Discussion

The eye detection and tracking techniques, based on their photometric and geometric properties, are discussed in the preceding sections. Each technique has its own pros and cons, and the best performance of any scheme requires fulfillment of specific conditions in image and video data. These conditions are related to ethnicity, head pose, illumination, and degree of eye openness. The existing approaches are usually well applicable to fully open eyes, near-frontal viewing angles, and under good illumination conditions. Table 1 summarizes the various eye detection techniques and compares them under various image conditions.

Summary and comparison of eye detection techniques.

3. Gaze Tracking

3.1. introduction.

The typical eye structure used in gaze tracking applications is demonstrated in Figure 3 . The modeling of gaze direction is based either on the visual axis or on the optical axis. The visual axis, which forms the line of sight (LoS) and is considered the actual direction of gaze, is the line connecting the center of the cornea and the fovea. The optical axis, or the line of gaze (LoG), is the line passing through the centers of pupil, cornea, and the eyeball. The center of cornea is known as the nodal point of the eye. The visual and optical axes intersect at the nodal point of the eye with a certain angular offset. The position of head in 3D space can be directly estimated by knowing the 3D location of the corneal or eyeball center. In this way, there remains no need for separate head location models. Thus, the knowledge of these points is the keystone for majority of the head pose invariant models [ 87 , 88 ].

An external file that holds a picture, illustration, etc.
Object name is sensors-19-05540-g003.jpg

Structure of human eye.

The objective of gaze tracking process is to identify and track the observer’s point of regard (PoR) or gaze direction. For this purpose, the important features of eye movements such as fixation, saccades, and smooth pursuit are utilized. Fixation represents the state when the observer’s gaze rests for a minimum time (typically more than 80–100 ms) on a specific area within 2–5° of central vision. Saccades are quick movements of eyes that take place when visual attention transfers between two fixated areas, with the aim of an bringing area of interest within the narrow visual field. When a driver visually follows a traveling object, this state is represented by smooth pursuit [ 62 ]. The data associated with the fixations and saccades provides valuable information that is used for the identification and classification of vision, neurological, and sleep conditions. In the field of medical psychology, data of the fixations is utilized to analyze a person’s attentiveness and level of concentration. Saccadic eye movements are widely studied in a variety of applications such as human vision research and drowsiness detection for vehicle drivers. Moreover, saccade is also used as a helpful index for determination of mental workload. Studies show that the saccade distance decreases when the task’s complexity increases [ 89 ].

The gaze tracking systems take two parameters as the input: eyeball orientation and head pose (defined by the orientation and position of the head) [ 90 ]. To change the gaze, a person can move his (or her) head while keeping the position of eyes fixed with respect to the head. Alternatively, gaze direction can also be changed by moving the eyeballs and pupil while the head is at rest. These two practices are respectively named “owl” and “lizard” vision in [ 91 ] because of their resemblance with these animals’ viewing behavior. Normally, we first move our heads to a comfortable position and then orient our eyes to see something. In this process, the head pose defines the gaze direction on a coarse scale, whereas the fine scale gaze direction is determined by the eyeball orientation. More specifically, to further understand the correlation between the head pose and eye pose, the study in [ 91 ] investigates two question: (i) How much better can gaze classification methods classify driver gaze using head and eye pose versus using head pose only? (ii) With the addition of eye pose information, how much does gaze classification improve? Generally, information of both the head pose and the pupil position is required in gaze estimation applications. As it will be explained in the later sections, the information of head pose is usually incorporated implicitly in gaze estimation applications rather than directly. An important aspect of gaze tracking process is the head pose invariance. The resultant head position invariance is achieved with the help of specific configurations of multiple cameras and other sensors whose a priori knowledge is available in the algorithms.

There are various configurations of lights and cameras, such as single camera, single light [ 88 , 92 , 93 , 94 ]; single camera, multiple lights [ 85 , 95 , 96 , 97 , 98 ]; and multiple cameras, multiple lights [ 30 , 99 , 100 , 101 , 102 , 103 ]. A complementary practice performed in all gaze tracking schemes is known as calibration. During the calibration process, elements of gaze tracking system are calibrated to determine a set of useful parameters, as explained below.

  • Calibration of geometric configuration of the setup is necessary to determine the relative orientations and locations of various devices (e.g., light sources and cameras).
  • Calibration associated with individuals is carried out to estimate corneal curvature—the angular offset between optical and visual axes.
  • Calibration of eye-gaze mapping functions according to the applied method.
  • Calibration of the camera is performed to incorporate the inherent parameters of the camera.

Certain parameters such as human specific measurements are calculated only once, whereas the other parameters are determined for every session by making the subject gaze at a set of specific points on a display. The parameters associated with devices, such as physical and geometric parameters of angles and locations between various devices, are calibrated prior to use. A system is considered fully calibrated if the geometric configuration and camera parameters are accurately known.

After introducing the basic concepts related to gaze tracking, the major techniques of gaze tracking are explained as follows.

3.2. Feature-Based Techniques

Feature-based gaze tracking techniques use eyes’ local features for gaze estimation. These techniques are broadly categorized as the model-based and the interpolation-based techniques, as explained below.

3.2.1. Model-Based Techniques

The model-based techniques use the geometric model features of the eye to directly calculate the gaze direction. The point of gaze is determined by the intersection of the gaze path with the object of a gaze [ 90 , 97 , 99 , 101 , 104 ]. These techniques model the general physical structure of the eye in geometric forms to estimate a 3D gaze direction vector. The PoR is calculated as the intersection of the closest object in the scene with the gaze direction vector.

Typically, there are three categories (i.e., intrinsic, extrinsic, and variable) of the parameters utilized for development of the geometric model of the eye [ 99 ]. The intrinsic parameters, calculated for fixed eye, remain unchanged during a tracking session; however, they change gradually over the years. These parameters include iris radius, cornea radius, the distance between the centers of the cornea and the pupil, the angle between optical and visual axes, and refraction parameters. The extrinsic parameters such as pupil radius are used to develop a model for optical axis and 3D eye position. These models adjust the shape of the eye according to the parameters.

Most 3D model-based techniques (e.g., [ 88 , 90 , 96 , 97 , 102 , 104 , 105 , 106 , 107 , 108 ] depend on metric information, and consequently, call for a global geometric model of orientation and position of devices and light sources. Further, camera calibration is also critical in these techniques. Some exceptional approaches use simplified assumptions [ 27 ] or use projective invariants [ 95 , 98 ]. We will not discuss the mathematical details of these techniques; however, most of them work on the same fundamental principles. The calibrated output of cameras is utilized to measure the lengths and angles by applying Euclidean relations. The general strategy is to make an assessment of the center of the cornea, and then to develop a model of optical axis. The points on the visual axis cannot be measured directly from the images. However, the offset to the visual axis is estimated by showing one or more points on the screen. The intersection of the visual axis and the screen in a fully calibrated setup provides the PoR.

In a model-based technique, the corneal center, which is the point of intersection of visual and optical axes, is considered as an important parameter for gaze estimation. If the corneal curvature is already known, it is possible to determine the corneal center with the help of two light sources and a camera. For estimation of corneal curvature, anthropomorphic averages are usually adopted due to their simplicity and ease of use [ 107 , 109 ]. However, if the eye-related parameters are unidentified, at least two cameras and two light sources are required to estimate the corneal center [ 96 ]. Several studies, such as [ 88 , 102 , 110 ], used model-based techniques in a fully calibrated arrangement. At a minimum, a single point of calibration is mandatory to estimate the angle between the visual and optical axes. This angle is used to estimate the direction of gaze [ 102 ].

3.2.2. Interpolation-Based Techniques

The regression-based methods (e.g., [ 27 , 69 , 100 , 111 , 112 , 113 , 114 , 115 ]), on the other hand, map the image features to the gaze coordinates. They either have a nonparametric form, such as in neural networks [ 113 , 116 ] or a specific parametric form, such as polynomials [ 112 , 117 ]. In initial gaze tracking applications, a single source of IR light was employed to enhance the contrast and consequently produce stable gaze estimates. Many single-glint techniques were implicitly based on an erroneous assumption that “the corneal surface is a perfect mirror.” This assumption inferred that the glint should remain stationary as long as the head position is fixed even when the corneal surface rotates. Therefore, the glint is taken as the origin in glint-centered coordinate systems. In this view, the difference between the pupil center and the glint is utilized to estimate the gaze direction. So, the pupil-glint difference vector is typically mapped to the screen. The authors of [ 118 ] developed a video-based eye tracker for real-time application. They used a single camera and employed IR light for dark bright pupil images. To compensate for head movements, they considered a set of mirrors and galvanometers. The PoR was estimated by using a linear mapping and the pupil-glint vector. The higher values of pupil-glint angles were considered to correspond to nonlinearities. They used polynomial regression to compensate for these nonlinearities. Similarly, in a later study, the authors of [ 73 ] represented a mapping of glint-pupil difference vector to the PoR. They utilized a single camera and considered a 2nd order polynomial to calculate the x and y-coordinates. However, as explained in [ 112 ], as the head moves farther from its initial position, decay in the calibration mapping is observed. In a way similar to [ 73 ] and [ 118 ], the authors of [ 119 ] proposed a polynomial regression technique for estimation of the PoR while assuming a flat cornea surface. Additionally, to compensate the gaze imprecision due to lateral head movements, they proposed a first order linear interpolation model. The results of these studies suggest that the higher order polynomials do not deliver superior calibration in practical applications. The findings of [ 119 ] are also supported by the results of [ 88 , 96 ].

For interpolation tasks, NNs and their variants are frequently adopted. The authors of [ 120 ] suggested a generalized NN-based regression technique in which the glint coordinates, pupil-glint displacement, pupil parameters, and ratio and orientation of the pupil ellipse’s major and minor axes are utilized to map the screen coordinates. The main objective of this technique is to eliminate the need for calibration after having performed the initial training. The results of the technique are accurate within 5° even in the presence of head movements. In [ 121 ], the authors used support vector regression to construct a highly non-linear generalized gaze mapping function that accounts for head movement. The results of this technique show that eye gaze can be accurately estimated for multiple users under natural head movement. Most gaze tracking techniques are unable to distinguish if the present input (or test data) is no longer compatible with the training or calibration data. So, the authors of [ 69 , 116 ] used the covariance of the test and training data to indicate when the gaze estimates significantly diverge from the training data.

It is observed that the head pose changes are not properly addressed by 2D interpolation techniques even with the eye trackers mounted on the head as these trackers might slip and change their position. To adjust minor slippage of head mounts, the authors of [ 122 ] proposed a set of heuristic rules. The single camera based 2D interpolation techniques indirectly model the eye physiology, geometry, and optical properties; and are typically considered approximate models. It is notable that head pose invariance is not strictly guaranteed in these models. However, their implementation is simple without requiring geometric or camera calibration, and they produce reasonably acceptable results for minor head movements. Some interpolation-based techniques try to improve the accuracy under increased head movements by using additional cameras, or through compensation [ 123 ]. The authors of [ 123 ] introduced a 2D interpolation-based technique to estimate 3D head position with the help of two cameras. They modified the regression function using the 3D eye position to compensate for head motions. However, in contrast to other interpolation-based techniques, the technique in [ 123 ] requires a prior calibration of the cameras.

3.3. Other Techniques

Most gaze estimation techniques are based on feature extraction and use IR light. However, in the following subsections, some alternative approaches are discussed which are based on different lines of action. These techniques utilize the reflections from the eye layers (Purkinje image), in contrast to of extracting iris and pupil features [ 124 , 125 ], appearance-based techniques [ 89 , 114 , 126 ], and the techniques that use visible light [ 114 , 116 , 127 ].

3.3.1. Appearance-Based Techniques

The appearance-based gaze estimation techniques take the contents of an image as input with the objective of mapping them directly to PoR on the screen. Accordingly, the underlying function to estimate the personal variations has relevant features extracted implicitly, without requiring the calibration of cameras and geometry. These techniques employ cropped images of the eye for training of the regression functions as observed in Gaussian process [ 114 ], multilayered networks [ 68 , 89 ], and manifold learning [ 128 ]. The authors of [ 114 ] obtained gaze predictions and related error measurements by using a sparse Gaussian process interpolation technique on filtered images in visible spectrum. The technique in [ 128 ] learned the eye image manifold by employing locally linear embedding. This technique significantly reduces the number of calibration points without sacrificing the accuracy. The accuracy of results of [ 128 ] is comparable to that of [ 89 ].

Appearance-based techniques normally do not necessitate the camera and geometric calibration as the mapping is performed directly on the contents of the images. While appearance-based techniques aim to model the geometry in an implicit manner, head pose invariance has not been reported in the literature. Moreover, since a change in illumination may alter the eye appearance, the accuracy of these techniques is also affected by the different light conditions for the same pose.

3.3.2. Visible Light-Based Techniques

The techniques based on visible or natural light are considered a substitute for the techniques based on IR, especially for outdoor daylight applications [ 27 , 34 , 69 , 76 , 90 , 106 , 114 ]. However, they have limitations due to the light variations in the visible spectrum and poor contrast images.

The authors of [ 76 ] modeled the visible part of the subject’s eyeball as a planar surface. They regarded gaze shifts due to eyeball rotations as translations of the pupil. Considering the 1-to-1 mapping of the projective plane and the hemisphere, the authors of [ 27 ] modeled the PoR as a homographic mapping to the monitor from center of the iris. The resultant model represents an approximation only as it does not consider the nonlinear one-to-one mapping. Moreover, this technique does not provide the head pose invariant models. The techniques developed in [ 90 , 106 , 127 ] estimated gaze direction by employing stereo and face models. The authors of [ 106 ] modeled the eyes as spheres and estimated the PoR from the intersection of the two estimates of LoG for each eye. In their work, a head pose model is used to estimate the eyeball center, and personal calibration is also considered. The authors of [ 90 , 127 ] combined a narrow-view-field camera with a face pose estimation system to compute the LoG through the one iris [ 90 ] and two irises [ 127 ], respectively. They assumed iris contours to be circles to approximate their normal directions in three dimensions by proposing novel eye models. Gaze estimation techniques that use rigid facial features are also reported in other studies, such as [ 63 , 129 , 130 ]. The locations of eye corners and the iris are tracked by means of a single camera, and the visual axis is estimated by employing various algorithms. The authors of [ 131 ] proposed the use of stereo cameras in natural light to estimate the gaze point. While these techniques do not require an IR light source, their accuracy is low as they are in the initial stages of development.

Finally, it is notable that a lack of light at night time reduces the functionality of human vision and cameras, which results in increased pedestrian fatalities occurring at night. The authors of [ 132 ] proposed an approach which utilized cost-effective arrayed ultrasonic sensors to detect traffic participants in low-speed situations. The results show an overall detection accuracy of 86%, with correct detection rates of cyclists, pedestrians, and vehicles at around 76.7%, 85.7%, and 93.1%, respectively.

3.4. Discussion

The gaze tracking systems which present negligible intrusiveness and minimal usage difficulty are usually sought-after as they allow free head movements. In the modern gaze tracking applications, video-based gaze trackers are gaining increased popularity. They maintain good accuracy (0.5° or better) while providing the user with enhanced freedom of head movement. The recent studies indicate that high-accuracy trackers can be realized if some specific reflections from the cornea are utilized. Furthermore, the resultant gaze estimation is more stable and head pose invariant. However, unfortunately, commercially available, high-accuracy gaze trackers are very expensive. Moreover, there is a trade-off among accuracy, setup flexibility, and the cost for gaze trackers. The readers can find a thorough discussion on performance and preferences of eye tracking systems in [ 133 ]. A comprehensive comparison of gaze estimation methods is provided in Table 2 .

Comparison of gaze estimation methods.

4. Applications in ADAS

4.1. introduction.

A driver’s gaze data can be used to characterize the changes in visual and cognitive demands to assess the driver’s alertness [ 136 , 137 ]. For instance, it is reported that increased cognitive demand impacts the drivers’ allocation of attention to the roadway [ 138 , 139 , 140 , 141 ]. With the increase of cognitive demand, drivers tend to concentrate their gaze in front of the vehicle. This gaze concentration results in a reduced frequency of viewing the speedometer and mirrors, and a reduced ability to detect in both peripheries [ 138 , 139 , 140 , 142 , 143 , 144 , 145 ]. These practices are consistent with unintentional blindness, loss of situational awareness, and situations such as ‘‘looked but failed to see’’ [ 139 , 143 , 146 ].

A prominent and intuitive measure to detect the changes in drivers’ gaze due to increased cognitive demand is percent road center (PRC). PRC is defined as “the percentage of fixations that fall within a predefined road center area during a specific period.” It has been shown that PRC increases with increased cognitive demand [ 136 , 141 , 142 , 143 , 147 ]. While the concept of PRC is simple to understand, the definition of road center differs significantly in the literature. It is defined either as a rectangular region centered in front of the vehicle having a width of 15° [ 141 ] and 20° [ 142 ], or a circular region of 16° diameter centered about the road center point [ 142 ] and centered on the driver’s most recurrent gaze angle [ 148 ]. Some implementations of PRC utilized raw gaze points and gaze trajectories recorded by eye trackers that were not clustered into saccades and fixations. The authors of [ 148 ] compared these approaches and observed a strong correlation between raw gaze-based PRC and fixation-based PRC. To characterize the variations in gaze behavior with cognitive demand, standard deviation of gaze points is also used by several researchers [ 137 , 138 , 139 , 142 ]. The standard deviation is either computed from the projection of the driver’s gaze trail on a plane or the driver’s gaze angle. A comparison of various techniques used to characterize the changes in drivers’ gaze under cognitive load is presented in [ 149 ].

The data associated with driver’s eyes and gaze is utilized by the ADAS algorithms to detect the driver’s attentiveness. A typical scheme adopted in the ADAS algorithms to detect and improve the driver’s alertness using usual visual data of the driver is shown in Figure 4 . These algorithms continuously capture the driver’s visual data through numerous sensors associated with the driver’s body and installed inside the vehicle. The obtained visual data is processed at the next stages to extract and classify the vital features. At the subsequent stage, a decision is made on the basis of data classification. The decision is conveyed to the driver in form of audible or visible signals, as shown in Figure 4 .

An external file that holds a picture, illustration, etc.
Object name is sensors-19-05540-g004.jpg

The stages of visual data in typical advanced driving assistance systems (ADAS) algorithms.

The subsequent sections present a detailed review of the systems and techniques that are used to detect the visual activities and distraction of a driver. A brief overview of driving process and associated challenges, however, seems feasible for better understanding of the subsequent sections.

4.2. Driving Process and Associated Challenges

The key elements of the driving process are driver, vehicle, and driving environment, as shown in Figure 5 . The driver, who plays the pivotal role in this process, has to understand the driving environment (e.g., nearby traffic and road signals), make decisions, and execute the appropriate actions [ 150 ]. Thus, the driver’s role has three stages: situational awareness, decision, and actions. Situational awareness is considered to be the most important and complicated stage which can be modeled as a three-step process. The first step is to perceive the elements in the environment within specific limits of time and space. The second step is to comprehend the relative significance of the perceived elements; and, the final step is to project their impact in near future. A driver’s ability to accurately perceive multiple events and entities in parallel depends on his (or her) attention during the first step (i.e., perception); and consequently, the situational awareness stage principally depends on it. Regarding the driver’s attention, is necessary to take in and process the available information during the decision and actions stages as well. Moreover, in a complex and vibrant driving environment, the need for the driver’s active attention increases, in order to save life and property. Thus, the ADAS continuously monitors the driver’s attention and generates an alarm or a countermeasure if any negligence is observed. The level of the alarm or countermeasure depends on the nature and intensity of the negligence.

An external file that holds a picture, illustration, etc.
Object name is sensors-19-05540-g005.jpg

Driving process.

The recent studies [ 151 , 152 ] explain that there are three major causes of road accidents that contribute to more than 90% of total road accidents. These causes are: distraction, fatigue, and aggressive driver behavior. The term “fatigue” denotes a compromised mental or physical performance and a subjective feeling of drowsiness. For drivers, the most dangerous types of fatigue are mental and central nervous fatigues which ultimately lead to drowsiness. Other types of fatigue include local physical fatigue (e.g., skeletal muscle fatigue) and general physical fatigue which is typically felt after an exhaustive physical activity. Aggressive driving activities such as shortcut maneuvers and ignoring speed limits also constitute to major reasons for road accidents. Since they are primarily related to a driver’s intended actions, local traffic rules seem more effective than mere warnings from ADAS. Nevertheless, ADAS systems are capable of warning, and in near-autonomous vehicles, preventing the severe consequences. Distraction is defined as the engagement of a driver in a competitive parallel task other than driving [ 153 ].

The driver’s performance is severely affected by the distraction, and it is considered the main reason for nearly half of the total accidents [ 154 , 155 ]. There are several distracting activities, such as eating, drinking, texting, calling, using the in-vehicle-technology, and viewing at the off-road environment [ 156 , 157 , 158 , 159 ]. According to the NHTSA, these activities are categorized as [ 155 , 159 , 160 ]:

  • Visual distraction (taking the eyes off the road);
  • Physical distraction (e.g., hands off the steering wheel);
  • Cognitive distraction (e.g., mind off the duty of driving);
  • Auditory distraction (e.g., taking ears off of the auditory signals and honks).

4.3. Visual Distraction and Driving Performance

Human beings have limited capability to perform multiple tasks simultaneously without compromising the performance of the all tasks. Therefore, engaging in a competing task while driving degrades the driver’s performance; and, consequently, endangers traffic safety. Driving behavior can be evaluated with certain driving performance indicators [ 161 , 162 ]. These indicators include: lateral control, reaction time, and speed, as discussed below.

4.3.1. Lateral Control

Typically, the lateral control is affected by visual distraction. The distracted drivers ultimately maneuver larger deviations in lane positioning as they need to compensate for slip-ups made while their eyes were off the road. This increased lane-position variability has been reported by several researchers (e.g., [ 140 , 163 ]). Moreover, as reported in [ 140 ], the steering control of distracted drivers is less smooth in comparison to their attentive driving states. On the other hand, the author of [ 164 ] found that there is no significant difference in the standard deviation of lateral control for distracted and normal drivers. The difference in findings of the researchers could be due to different test conditions and varying driving behaviors.

4.3.2. Reaction Time

Reaction time is calculated by numerous measures, such as brake reaction time (BRT), detection response time (DRT), and peripheral detection time (PDT). These reaction times provide a measure of the driver’s mental load. Usually, the reaction time increases for visually distracted drivers [ 165 , 166 , 167 ].

4.3.3. Speed

A driver’s distraction due to visual stimuli typically results in a speed reduction [ 147 , 163 , 168 ]. The reduced speed is perhaps the result of a compensatory mechanism for a potential risk as the potential risk can be minimized through a reduced speed. However, contradictory findings are reported in [ 164 ]. The authors of [ 164 ] observed an increased average speed and several speed violations for distracted drivers. The authors reasoned that the very low noise inside the vehicle was reason for the inconsistencies as the driver, thinking that the vehicle is at normal speed, did not monitor the speedometer frequently. We believe that since different researchers have different simulation or test environments (e.g., nearby vehicles, road conditions), differences between or opposition to each other’s findings are natural. Moreover, the behavior of different distracted drivers in respect to speed control is not always the same.

4.4. Measurement Approaches

Researchers have exploited the features of eye movement data for driver’s distraction and drowsiness detection [ 169 , 170 ]. The following features related to eyeball and eyelid movements are frequently used in this field [ 171 , 172 , 173 , 174 ].

  • PERCLOS : It is a measure of percentage of eye closure. It corresponds to the percentage of time during a one-minute period for which the eyes remain at least 70% or 80% closed.
  • Percentage eyes >70% closed (PERCLOS70).
  • Percentage eyes >80% closed (PERCLOS80).
  • PERCLOS70 baselined.
  • PERCLOS80 baselined.
  • Blink Amplitude : Blink amplitude is the measure of electric voltage during a blink. Its typical value ranges from 100 to 400 μV.
  • Amplitude/velocity ratio (APVC).
  • APCV with regression.
  • Energy of blinking (EC).
  • EC baselined.
  • Blink Duration: It is the total time from the start to the end of a blink. It is typically measured in the units of milliseconds. A challenge associated with blink behavior-based drowsiness detection techniques is the individually-dependent nature of the measure. For instance, some people blink more frequently in wakeful conditions or some persons’ eyes remain slightly open even in sleepy conditions. So, personal calibration is a prerequisite to apply these techniques.
  • Blink Frequency: Blink frequency is the number of blinks per minute. An increased blink frequency is typically associated with the onset of sleep.
  • Lid Reopening Delay : It is measure of the time from fully closed eyelids to the start of their reopening. Its value is in the range of few milliseconds for an awake person; it increases for a drowsy person; and is prolonged to several hundred milliseconds for a person undergoing a microsleep.
  • Microsleep: An eye blink is detected when the upper lid of the eye remains in contact with the lower lid for around 200–400 ms, and if this duration exceed 500 ms (and less than 10 s), this situation corresponds to a microsleep [ 173 , 175 ]. A driver’s microsleep can lead to fatal accidents.
  • Microsleep event 0.5 sec rate.
  • Microsleep event 1.0 sec rate.
  • Mean square eye closure.
  • Mean eye closure.
  • Average eye closure speed.

A driver’s physical activities such as head movements are captured and processed in the ADAS applications [ 176 , 177 , 178 , 179 ]. The video cameras are installed inside the vehicle at suitable locations to record the driver’s physical movements and gaze data. The main advantage of video-based gaze detection approaches lies with its nonintrusive nature [ 180 , 181 , 182 , 183 ]. For instance, the authors of [ 176 ] modeled and detected a driver’s visual distraction using the information associated with pose and position of the driver’s head. However, both intuitively and when explained by the authors, this technique is prone to report false positives. The primary reason for this is the possibility of the driver looking on the road while his (or her) head is tilted to a side. This study also explains the need for high-performance eye and gaze tracking systems for ADAS. The author of [ 177 ] proposed an improved technique by incorporating the PRC of gaze direction. They analyzed it over a 1 min epoch. For their setup, they found that PRC < 58% was a result of visual distraction, whereas PRC > 92% was due to cognitive distraction.

The authors of [ 184 ] reported a correlation between driving performance and visual distraction by utilizing gaze duration as a detection feature. The existence of such correlation was also confirmed by the authors of [ 185 ]. It has been reported that the detection accuracy observed through using eye-movement data alone is nearly equal to that observed thorough using both eye-movement and driving performance data [ 185 ]. As reported in earlier studies and verified by recent research [ 186 , 187 , 188 , 189 , 190 ], eye-movement features can be effectively used for detection of visual as well as cognitive distraction. Distracted drivers are found to exhibit longer fixation durations or frequent fixations towards competing tasks. It is also observed that, a cognitively distracted driver usually exhibits longer fixation duration at the same area. The area of fixation can be either associated with a competing task (e.g., multimedia inside the vehicle) or with the peripheries of the field of view.

The combined effect of visual and cognitive distraction is also reported in [ 140 ]. It is notable that, by definition, visual distraction is different from cognitive distraction (which includes the state “looked but did not see”), and their effects are also not the same. Cognitive distraction disturbs the longitudinal control of the vehicle, whereas visual distraction affects the vehicle’s lateral control and steering ability of a driver [ 191 ]. Moreover, as discussed in [ 140 ], overcompensation and steering neglect is related to the visual distraction, whereas under-compensation is associated with cognitive distraction. Similarly, hard braking is mostly related to the cognitive distraction [ 136 , 141 ]. Typically, the accidents due to visual distraction are more disastrous compared to the accidents because of cognitive distraction. The findings of [ 50 ] suggest that during visual distraction only the frequency and duration of eye fixations is higher than the combined (visual as well as cognitive) distraction. However, the frequency and duration of eye fixations during combined distraction is higher than that of cognitive distraction alone. It is notable that for adequate situation awareness there must be a specific range of suitable duration and frequency of eye fixation that depends on the driver and driving environment. Therefore, eye movement features can be helpful in order to accurately discriminate between visual and cognitive distraction only if the specific range of eye-movement features is pre-identified for each driver.

In addition to already explained physical measures, biological measures such as electrooculography (EOG) also provide data for sleepiness detection. EOG signals are frequently used to measure eye-related activities for medical purposes; however, their use in ADAS applications is accompanied with certain challenges. For example, the location of EOG electrodes has a special significance in its applications, as the accuracy of the collected data depends on distance of the electrodes from the eyes [ 192 , 193 ]. At the same time, it was observed that drivers do not feel comfortable with the electrodes attached to their eyes during normal driving situations. So, such experimentation is possible for simulation-based studies but not feasible for real-world applications.

Realizing the relative advantages and limitations of the above-discussed techniques, the researchers now tend to fuse various techniques to produce an optimal solution for distraction detection systems of ADAS. By merging the information obtained from vehicle’s parameters (e.g., turning speed, and acceleration) and driver’s physical and biological parameters, more accurate and reliable results are reported. For example, the authors of [ 194 ] reported the distraction detection accuracy to be 81.1% by fusing the data of saccades, eye fixation, lateral control, and steering wheel through a support vector machine algorithm. The authors of [ 195 ] detected driver’s distraction by processing the information obtained from physical (blink frequency, location, and eye-fixation duration) and driving performance parameters (steering wheel and lateral control). Using the same physical parameters, the authors of [ 196 ] considered different driving performance measures (i.e., speed, lateral acceleration, and longitudinal deceleration) to detect the driver’s distraction. The authors of [ 197 ] merged biological and physical parameters (head orientation, gaze data, and pupil diameter) to produce more accurate results (91.7% and 93%) using support vector machine and adaptive boosting (Adaboost) algorithms, respectively. A summary of measurement techniques, their advantages, and their limitations are presented in Table 3 .

Summary of measurement techniques.

4.5. Data Processing Algorithms

The data of driver’s eyes and gaze has information associated with the driver’s level of alertness. The following features of driver’s visual data are frequently used in ADAS applications:

  • Difference between the maximum and minimum value of the data;
  • Standard deviation of the data;
  • Root mean square value of the data;
  • Duration of the signal data;
  • Maximum difference between any two consecutive values;
  • Median of the data;
  • Mean of the data;
  • Maximum value of the data;
  • Minimum value of the data;
  • Amplitude of the difference between the first value and the last value;
  • Difference between the max and min value of the differential of data.

There are various algorithms developed and implemented by researchers to model and utilize eye and gaze data for detection of a driver’s alertness and intentions. These algorithms use fuzzy logic [ 198 , 199 , 200 , 201 ]; neural networks [ 202 , 203 ]; Bayesian networks [ 113 , 204 , 205 ]; unsupervised, semi-supervised, and supervised machine learning techniques [ 186 , 189 , 206 ]; and combinations of multiple techniques. It is logical that depending upon the usage and available resources, the processing algorithms select and process the data or part of it. For example, the authors of [ 207 ] argued that it is sufficient to partition gaze into regions for the purpose of keeping the driver safe. Their proposed approach, which estimates driver’s gaze region without using eye movements, extracts facial features and classifies their spatial configuration into six regions in real time. They evaluated the developed system on a dataset of 50 drivers from an on-road study while resulting in an average accuracy of 91.4% at an average decision rate of 11 Hz. Furthermore, algorithms for special circumstances such as during hazy weather are also discussed in the literature and belong to already discussed categories. For instance, the work in [ 208 ] is based on deep learning approaches. In general, all of these algorithms execute a recursive process similar to the flowchart shown in Figure 6 . The presented flowchart shows, for example, how eye tracking is achieved in the ADAS applications. The main steps shown in the flowchart can be realized by application of any suitable conventional or modern algorithm.

An external file that holds a picture, illustration, etc.
Object name is sensors-19-05540-g006.jpg

Flowchart of a generic eye tracking algorithm.

Additionally, the eye and gaze data are also used for early detection of a driver’s intentions, which is an interesting feature of ADAS. Most schemes developed for prediction of a driver’s maneuvering behavior are principally based on the hidden Markov model (HMM) and its variants [ 209 , 210 , 211 , 212 ]. These schemes are applied to the data obtained from the driver’s gaze sequence [ 9 ] and head position [ 213 ]. To process the data, feature-based pattern recognition and machine learning techniques are frequently utilized [ 214 , 215 , 216 ]. These schemes are designed to either detect a single maneuver behavior such as lane change only, or turn only [ 211 , 214 , 217 , 218 , 219 ] or multiple maneuver behaviors [ 220 ]. For instance, early detection of intention to change the lane was achieved in [ 221 ] using HMM-based steering behavior models. This work is also capable of differentiating between normal and emergency lane changes. Similarly, researchers utilized the relevance vector machine to predict driver intentions to change lanes [ 222 ], apply brakes [ 223 ], and take turns [ 224 ]. Moreover, by applying artificial neural network models on gaze behavior data, the authors of [ 202 ] conjectured the driver’s maneuvering intentions. In [ 206 ], deep learning approaches were utilized for early detection of the driver’s intentions. In this work, recurrent neural network (RNN) and long short-term memory (LSTM) units were combined which fuse the various features associated with the driver and the driving environment to predict the maneuvers. These features included the face and eye-related features captured by a face camera, and the driving parameters and street map and scene. The system developed in [ 206 ] can predict a maneuver 3.5 s earlier, together with the recall performance of 77.1% and 87.4% and the precision of 84.5% and 90.5% for an out of the box and a customized optimal face tracker, respectively. In addition to feature-based pattern recognition algorithms, linguistic-based syntactic pattern recognition algorithms are also proposed in the literature for early detection of driver’s intent [ 220 ]. The authors of [ 225 ] adopted the random forest algorithm and utilized the data of transition patterns between individual maneuver states to predict driving style. They showed that use of transition probabilities between maneuvers resulted in improved prediction of driving style in comparison to the traditional maneuver frequencies in behavioral analysis. Table 4 presents a summary of data processing algorithms used in ADAS that utilize a driver’s eye and gaze data for detection of distraction and fatigue.

Summary of various eye tracking algorithms.

4.6. Application in Modern Vehicles

Vehicle manufacturing companies use the features of drivers’ visual data to offer services and facilities in high-end models their vehicles. These vehicles are equipped with cameras, radars, and other sensors to assist drivers in safe and comfortable driving. For example, the Cadillac Super Cruise system utilizes FOVIO vision technology developed by Seeing Machines. In this system, a gumdrop-sized IR camera is installed on the steering wheel column to precisely determine the driver’s alertness level. This is achieved through an exact measurement of eyelid movements and head orientation under a full range of day and night-time driving conditions. The system is capable of working well even when the driver is wearing sunglasses. Table 5 summarizes the features offered by vehicle manufacturing companies.

A summary of features offered in modern vehicles.

5. Summary and Conclusions

This paper reviewed eye and gaze tracking systems—their models and techniques, the classification of techniques, and their advantages and shortcomings. Specifically, their application in ADAS for safe and comfortable driving has been discussed in detail. While these tracking systems and techniques show improvement in ADAS applications, there exists a significant potential for further developments, especially due to emergence of autonomous vehicle technology. The National Highway Traffic Safety Administration (NHTSA) of the USA defines six levels of vehicle automation to provide a common interface for research and discussions among different agencies, companies, and stakeholders [ 241 ]. These levels range from no automation (level-0) to fully automated vehicles (level-5). Between the levels of no automation to full automation, the automated system has an authority to control the vehicle. In this way, the drivers reduce attention to the road, and, consequently, get distracted as they feel the freedom of disengaging themselves from driving [ 242 , 243 ]. Although the vehicle manufacturing companies and the traffic control agencies clearly state that human drivers should monitor the driving environment at these levels, several challenges related to use and application still persist. Specifically, can a driver remain disengaged from the driving while relying on the ADAS and still maintain a safe driving environment? Similarly, what if the automated system has only the option to save either vehicle or property? Satisfactory answers to these questions are still unclear and belong to an area of active research.

The authors believe that the mass adoption of eye and gaze trackers depends on their cost as much as their accurate functionality in natural environments (i.e., changing light conditions and usual head movements). In this regard, requirements and features of future eye and gaze trackers are discussed here.

Cost: The prices of existing eye trackers are too high to be used by the general public. The high cost of eye trackers is mainly due to the cost of parts (e.g., high quality lenses and cameras), the development cost, and comparatively limited market. To overcome this problem, the future eye and gaze trackers should opt for the commonly available standard off-the-shelf components, such as digital or web cameras. Additionally, new theoretical and experimental developments are needed so that accurate eye and gaze tracking may be achieved with low quality images.

Flexibility: Existing gaze trackers typically need calibration of both the geometric arrangement and the camera(s) which is a tedious job. In certain situations, it could be appropriate to calibrate, for example, the monitor and light sources without requiring the geometric and camera calibration. Such a flexible setup is advantageous for the eye trackers intended for on-the-move usage.

Calibration: The present gaze tracking techniques either use a simple prior model with several calibration points or a strong prior model (hardware calibrated) with a brief calibration session. A future direction in gaze tracking is to develop the techniques that require no (or extremely minimal) calibration. We believe that novel eye and gaze models should be developed to realize calibration-free gaze tracking, which is reliable as well.

Tolerance: Currently, only partial solutions exist to accommodate the tolerance required by the application involving eyeglasses and contact lenses. The problems in such situations may be partially solved by using multiple light sources coordinated with the users’ head movement relative to the light source and camera. The trend of producing low-cost eye tracking systems may increase for their use in mainstream applications. This practice, however, can lead to low accuracy gaze tracking which could be acceptable for certain applications, but not for ADAS. We believe that additional modeling approaches such as modeling eyeglasses themselves under various light conditions may be required if eye trackers are to be utilized in outdoor applications.

Interpretation of gaze: While addressing the technical issues associated with eyes and gaze tracking, the interpretation of relationship between visual and cognitive states is also very important. The analysis of the behavior of eye movements helps determining the cognitive and emotional states as well as the human visual perception. The future eye and gaze trackers may exploit a combination of eye and gaze data with other gestures. Definitely, this is a topic of long-term multi-disciplinary research.

Usage of IR and Outdoor Application: IR light is used in eye tracking systems as it is invisible to the user and light conditions can be controlled to obtain stable gaze estimation and high contrast images. A practical drawback of such systems is the limited reliability when used in outdoor applications. So, the increased reliability in outdoor usage is a requirement for future eye tracking systems. The current efforts to overcome this limitation are in development stages and further research is required.

Head mounts: A part of the research community emphasizes remote gaze tracking, eliminating the need for head mounts. However, the gaze trackers with head mounts may see a revival due to the problems associated with remote trackers and due to the higher attention on portable, tiny head-mounted displays [ 244 ]. The head-mounted eye tracking systems are usually more precise as they remain minimally affected by the external variations and their geometry allows for more constraints to be applied.

Author Contributions

M.Q.K. and S.L. conceived and designed the content. M.Q.K. drafted the paper. S.L. supervised M.Q.K. with the critical assessment of the draft for a quality revision.

This research was supported, in part, by “3D Recognition Project” of Korea Evaluation Institute of Industrial Technology (KEIT) (10060160), in part, by “Robocarechair: A Smart Transformable Robot for Multi-Functional Assistive Personal Care” Project of KEIT (P0006886), and, in part, by “e-Drive Train Platform Development for Commercial Electric Vehicles based on IoT Technology” Project of Korea Institute of Energy Technology Evaluation and Planning (KETEP) (20172010000420) sponsored by the Korean Ministry of Trade, Industry and Energy (MOTIE), as well as, in part, by the Institute of Information and Communication Technology Planning & Evaluation (IITP) Grant sponsored by the Korean Ministry of Science and Information Technology (MSIT): No. 2019-0-00421, AI Graduate School Program.

Conflicts of Interest

The authors declare no conflicts of interest.

IMAGES

  1. The ERICA eye gaze communication system in use.

    eye gaze communication system research paper

  2. Eye Gaze Communication

    eye gaze communication system research paper

  3. Eye gaze communication

    eye gaze communication system research paper

  4. Eye Gaze Communication system

    eye gaze communication system research paper

  5. (PDF) Eye Gaze Recognition System

    eye gaze communication system research paper

  6. (PDF) Eye Gaze Recognition System to Assist Paralyzed Patients

    eye gaze communication system research paper

VIDEO

  1. Eye Gaze Communication #als #aac #abilitydrive

  2. Sammy testing the Tobii eye gaze system

  3. Seminar on eye gaze communication system

  4. Jessica using the eye gaze system for communication

  5. 3 Critical Questions to Answer in the First Minutes of an Eye Gaze Trial

  6. SpeakYourMind: Aaron typing with his eyes! Powered by EyeTribe and Microsoft

COMMENTS

  1. User Experiences of Eye Gaze Classroom Technology for Children With

    This paper is situated within a larger study that explored a number of characteristics typical of technology-mediated communication in ... It is worth noting that eye gaze communication systems in AAC can reduce control from communication partners. ... She has participated in conducting research into eye gaze technology, touch tablets, safe ...

  2. A Review and Analysis of Eye-Gaze Estimation Systems, Algorithms and

    In this paper, a review is presented for the research on eye gaze estimation techniques and applications, which has progressed in diverse ways over the past two decades. Several generic eye gaze use-cases are identified: desktop, TV, head-mounted, automotive, and handheld devices. Analysis of the literature leads to the identification of several platform specific factors that influence gaze ...

  3. The role of eye gaze in regulating turn taking in conversations: A

    Eye gaze plays an important role in communication but understanding of its actual function or functions and the methods used to elucidate this have varied considerably. This systematized review was undertaken to summarize both the proposed functions of eye gaze in conversations of healthy adults and the methodological approaches employed. The eligibility criteria were restricted to a healthy ...

  4. Development of a real-time eye movement-based computer ...

    This research paper aims to develop an efficient real-time eye-gaze communication system using a low-cost webcam for disabled persons. This proposed work developed a Video-Oculography (VOG) based system under natural head movements using a 5-point user-specific calibration (algorithmic calibration) approach for eye-tracking and cursor movement.

  5. A Scoping Review of Eye Tracking Technology for Communication: Current

    The current paper reports on recent literature on eye tracking technology and its role in facilitating the right to communication. ... This study looked at eye gaze as a functional communication system in the classroom. ... Eyes on Communication Research Group. Eye-gaze control technology for people with cerebral palsy. Clinical Guidelines 2021.

  6. PDF The Eye gaze Communication System

    communication system. In other words, the Eye tracking is the process of measuring either the point of gaze (where one is looking) or the motion of an eye relative to the head. An eye tracker is a device for measuring eye positions and eye movement. Eye trackers are used in research on the visual system, in psychology, in cognitive linguistics ...

  7. Frontiers

    This review has identified a variety of methodological approaches that are likely to affect eye gaze behavior in communication. The review is unable to provide a set of specific design guidelines for future studies on eye gaze behavior in communication, as these would very much depend on the research question and resources available.

  8. Communicating Eye-gaze Across a Distance: Comparing an Eye-gaze enabled

    Eye gaze is an important and widely studied non-verbal resource in co-located social interaction. When we attempt to support tele-presence between people, there are two main technologies that can be used today: video-conferencing (VC) and collaborative virtual environments (CVEs). In VC, one can observe eye-gaze behaviour but practically the targets of eye-gaze are only correct if the ...

  9. Gaze Pattern Recognition in Dyadic Communication

    A single camera eye-gaze tracking system with free head motion. In Proceedings of the Symposium on Eye Tracking Research & Applications. 87-94. Google Scholar Digital Library; Petr Kellnhofer, Adrià Recasens, Simon Stent, Wojciech Matusik, and Antonio Torralba. 2019. Gaze360: Physically Unconstrained Gaze Estimation in the Wild.

  10. PDF Smartphone-Based Eye Gesture Communication for People with Motor

    -free gaze systems [13,37] are an emerging offer further speed improvements. Other, less common, gaze input interfaces may include techniques like scanning [4] or zooming [34]. Specialized hardware and setups are used in eye gaze tracking systems. Head-mounted eye trackers [1,18] keep the eyes and the camera close and relatively static during

  11. PDF Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning

    2.1. Gaze Communication in HHI Eye gaze is closely tied to underlying attention, inten-tion, emotion and personality [32]. Gaze communication allows people to communicate at the most basic level re-gardless of their verbal language system. Such gaze func-tions thus transcend cultural differences, forming a univer-sal language [11].

  12. Eye Gaze 101: What Speech-Language Pathologists Should Know About

    An AAC system that can be used with eye gaze technology usually involves a computer-based device and an eye-tracking device. ... Proceedings of the Eye Tracking Research and Applications Symposium on Eye Tracking ... Toward a definition of communicative competence for individuals using augmentative and alternative communication systems ...

  13. Automatic Gaze Analysis: A Survey of Deep Learning Based Approaches

    Eye gaze analysis is an important research problem in the field of Computer Vision and Human-Computer Interaction. Even with notable progress in the last 10 years, automatic gaze analysis still remains challenging due to the uniqueness of eye appearance, eye-head interplay, occlusion, image quality, and illumination conditions. There are several open questions, including what are the important ...

  14. What Makes Eye Contact Special? Neural Substrates of On-Line Mutual Eye

    Eye gaze provides a communicative signal that transfers information regarding emotional and mental states (Emery, 2000). Eye contact, or mutual gaze, conveys the message, "I am attending to you," thereby promoting effective communication and enhancing social interaction (Farroni et al., 2002; Schilbach, 2015).

  15. PDF Eyegaze Communication System

    Eye trackers are used in research on the visual system, in psychology, in cognitive linguistics and in product design. There are a number of methods for measuring eye movement. The eyegaze System is a commsunication control system that you can run with your eyes. The Eyegaze System is a direct-select visioncontrolled communication and control ...

  16. PDF Development of a real-time eye movement-based computer ...

    This research paper aims to develop an ecient real-time eye-gaze communication system using a low-cost webcam for disabled persons. This proposed work developed a Video-Oculography (VOG) based system under natural head movements using a 5-point ... • The proposed system is a cost-ecient system for real-time eye gaze communication that does ...

  17. (PDF) A Robust Webcam-based Eye Gaze Estimation System ...

    A Robust W ebcam-based Eye Gaze Estimation. System for Human-Computer Interaction. Koushik Roy1and Dibaloke Chanda2. Department of Electrical, Electronic and Communication Engineering (EECE ...

  18. (PDF) Eyegaze Communication System

    The eyegaze System is a commsunication control system that you can run with your eyes. The Eyegaze System is a direct-select visioncontrolled communication and control system. The motto of this paper clearly deals with the case study of eyegaze communication system. 2.EYEGAZE SYSTEM USERS: This system is mainly developed for those who lack the ...

  19. PDF Eye Gaze Communication System for Differently Abled People

    We can develop system that detects the motion of the eye gaze of the disabled person for communication purpose. Hence by using the movement of eye, we propose a low cost-based eye gaze communication system to aid in communication[1]. The movement of the eye ball is tracked by locating the position of iris using various image processing techniques.

  20. A Speech-Driven Embodied Communication System Based on an Eye Gaze

    In this paper, we develop an advanced avatar-mediated communication system by applying the proposed eye gaze model to InterActors. This system generates the avatar's eyeball movements such as gaze and looking away based on the proposed model by using only speech input, and provides a communication environment wherein the embodied interaction ...

  21. PDF Vol. 4, Special Issue 3, January 2017 Eye Gaze Communication

    eyes. The eye gaze system is a direct-select vision controlled communication and control system. To operate eye gaze communication system user should have one eye with good vision and ability to keep head fairly stable. The main aim of the paper clearly deals with the various aspects of eye gaze communication and different

  22. Eye-gaze systems

    Abstract: Several generic CE use cases and corresponding techniques for eye gaze estimation (EGE) are reviewed. The optimal approaches for each use case are determined from a review of recent literature.

  23. Gaze and Eye Tracking: Techniques and Applications in ADAS

    The future eye and gaze trackers may exploit a combination of eye and gaze data with other gestures. Definitely, this is a topic of long-term multi-disciplinary research. Usage of IR and Outdoor Application: IR light is used in eye tracking systems as it is invisible to the user and light conditions can be controlled to obtain stable gaze ...