Superhuman swag: shaping the future of social interfaces.

Part I: Engagement through character.

Maxim Makatchev
10 min readNov 30, 2017

This is Part I of an updated version of a talk I gave for @Futures_Design at the Mozilla Foundation in San Francisco on July 20th, 2017. Part II: From engagement to relationship, is available here.

Humanoid robots are now capable of backflips, and self-learning algorithms beat human champions in Go, ostensibly the hardest board game there is. Yet the way we interact with technology is just barely tilting the scale from us adapting to a machine towards the machine accepting us just the way we are. The future is almost unanimously pictured as us communicating with machines the way we communicate with each other. This assumes that we, users, are ready to accept the machines as fellow social agents, similar to other humans or pets. In this post, and in Part II, I will try to critically examine this assumption and discuss the practical reasons for this vision of the future.

Why character?

Engagement is a key to an effective interaction with media. Without engagement, a medium never enters our lives to make an impact. One of the best ways to engage is to tell a story. The stories that we find most compelling generally feature a character we can relate to: a believable character.

Advertisers have known this for a long time: effective ads introduce products in the context of a compelling story with relatable characters. Our instinct to relate is so automatic that there are meta-level commercials that make fun of it. The graph below shows how I picture the evolution of media. Character drives engagement, and engagement drives the effectiveness of the media over an ever-increasing set of functions.

The dotted line, which corresponds to human evolution, is optimistically tilted up. I could also draw it tilted down since, arguably, media make us less functional on our own. The main point of the graph, however, is that at some point in time artificial characters overtake humans in their ability to engage. Have we crossed this point? Or is it still in a hypothetical future?

Are humans the best engagers?

The short answer is, not always.

Even though we, humans, might like to think of ourselves as greatest communicators on the planet, our psychology makes us underperform in a number of ways when we are engaging in social interactions. Some of the evolutionary mechanisms that served us for thousands of years are still active, but actually not that helpful in the modern world.

One such psychological mechanism is in-group bias. This is the tendency to disproportionally (and often unreasonably) prefer members of our own in-group. The in-group bias can apply to groups defined by ethnicity or gender. It can even appear in groups that were formed randomly and where their members had no chance to interact within or across the groups (check out this research review by Marilynn Brewer).

Humans are also notoriously bad at tolerating and adapting to cognitive and intellectual differences, such as a difference in opinion or a difference in perceived intellectual capability. Flaws in intelligence or morality, as judged by us, are some of the few sins for which we almost always feel justified to lay the blame on the individual. Interactions with such people are taxing, which must negatively affect quality of our encounters with them. Self-improvement articles advise us to exclude such people from our social circles.

By now you might agree that humans do not always exhibit exemplary social behaviors. What’s more that, even if we did, there are situations when simply being a human puts us at disadvantage.

Too human to succeed

Human faces, with their rich social cues, may distress some children with autism. Robots with minimal design, like Keepon, do not overwhelm them, making them superior to humans for certain interactions with such children.

Keepon interacting with a child. Image via BeatBots.

Similarly, hard-to-engage dementia patients respond well to artificial furry creatures such as the seal robot Paro (video) and the tail-wagging pillow Qoobo. This form of engagement has a clear utility through improved therapeutic outcomes, as the patients are able to follow procedures and exhibit a better general mood.

Paro. Image via Marie Claire, UK.
Qoobo in action. Image via Kickstarter.

Another situation where being a human may be a hindering factor, is when one needs to elicit some sort of personally embarrassing information. A virtual agent may be able to elicit more information when it is perceived as autonomous, compared to when the user perceives the agent as tele-operated by a human. In general, studies (not cited here since they are behind paywalls) suggest an inverse relationship between the anthropomorphism of the interface and self-disclosure. Our willingness to share private information may also be affected by the degree of physical embodiment of our interviewer. There is evidence that talking with a physical entity, rather than an animated on-screen character, may reduce the degree of self-disclosure of undesirable behaviors.

In spite of these examples, there is still hope for anthropomorphic creatures. For many everyday interactions, a human likeness is both familiar and relatable.

So what are some of the dimensions of the design space of an engaging anthropomorphic character?

Believable anthropomorphic characters

Performance arts, like theater, have been successful in developing techniques for creating believable characters that engage and transform the audience. It is said, that out of all actors, the most believable are children and animals: they play themselves. Other characters, however, are expected to have a cohesive backstory, behaviors and appearance.

Much like human actors performing as believable characters for their audience, a social robot can be viewed as an actor playing a character that engages human users.

Backstory

According to his evolving storyline, Tank the Roboceptionist, is a retired army robot whose romantic endeavors included dating a Pittsburgh Steelers stadium scoreboard.

Tank in his booth chatting with a visitor.

Interactions with Tank can be divided into two categories: those that deploy relational conversational strategies — users start them with a greeting, and those that are utilitarian — users begin directly with a question. Our analysis shows that relation-oriented users are more persistent in face of the robot’s non-understandings: they are more likely to rephrase their initial question, rather than to give up on the interaction. As a result, users who begin with a greeting have better task-completion rates, where the task is to get their question answered by the robot.

Utilitarian and relational conversational patterns with Tank, the Roboceptionist.

Dramatic delivery

Victor is a trash-talking, Scrabble-playing robot who engages up to three other players in a conversation. To make him more relatable, his Scrabble powers are not infinite: he may not always choose the best move and he takes his time to make a decision. His trash-talking, however, is authored by a dramaturg and a Carnegie Mellon undergraduate, and therefore is potentially superior to that of an average human.

Specific phrases that Victor says are modulated from boasting to self-deprecating, depending on the score of the latest turn and his mood, which varies according to the progress of the game.

Victor, the scrabble-playing, trash-talking robot.

Cross-cultural believability

Hala is a culture-aware Roboceptionist at Carnegie Mellon campus in Qatar. She is designed to work in public, in a multi-cultural environment that requires her to conduct conversations in both English and Arabic.

Hala, the culture-aware Roboceptionist at work in Doha, Qatar

Remember the in-group bias from earlier in this post? A closely related phenomenon is homophily: people tend to associate with similar others.

Could a robot, like Hala, leverage homophily to improve its interactions with users from different cultural backgrounds? As creators of Hala, we decided that to test this hypothesis we would need to achieve a situation where a user identified with the robot and felt that it was culturally similar. Was such situation even possible? This boils down to an even more basic question: How do you endow a robot with a culture?

Expressing culture

The term culture has many meanings. In words of Michael Agar, culture “is one of the most widely (mis)used and contentious concepts in the contemporary vocabulary.” When tackling this problem, we wanted to deal with something less vague, so we rephrased the problem in terms of ethnicity, specifically communities defined by their native languages: American English or Arabic.

So, how do we go about expressing an ethnicity with a robot?

The robot might be able to elicit ethnic attribution via its appearance or its behaviors. Giving an artifact, like a robot, an appearance that is associated with a particular ethnic group is a challenging design problem in itself: a designer would have to avoid both, offensive stereotypes and the uncanny valley.

There is another problem with attempting to achieve ethnic homophily through appearance: robots that interact with multiple people in public spaces will be facing users of varied ethnic backgrounds. Strong ethnic cues that might improve interactions for some users could have an opposite effect for the others.

What about expressing ethnicity through the robot’s behaviors? How does one identify behaviors that elicit ethnic attribution and are not offensive stereotypes? We have chosen a data-driven approach. First, we selected candidate behaviors from ethnographic research and role-play experiments. Second, we tested the videos of the selected behaviors for ethnic saliency via crowdsourcing.

Since there were no data sets of human-human service encounters that contain both verbal and non-verbal behaviors, we had to create our own. We asked native speakers of American English and Arabic to role-play receptionist encounters in English, recorded their videos from 2–3 angles, and meticulously transcribed their speech and annotated their body language. The result was the openly available multi-modal corpus of cross-cultural receptionist encounters, that allows the kind of visualizations where we can show intervals of speech, mutual and shared gaze along the interaction timeline.

Speech and gaze patterns during an interaction between a receptionist and a visitor.

This kind of multi-modal corpus can be mined for patterns of behavior and language that are most distinct between the ethnicities that we were focused on. For example, the data showed that when a receptionist does not know an answer to a question, some native speakers of Arabic chose to provide an explanation, while a native speaker of American English have deployed another strategy, such using a facial expression known as a lower-lip stretcher.

Even though these behaviors were discovered from the corpus, their effectiveness as cues of ethnicity was still just a hypothesis. We tested such hypotheses by rendering the behaviors (such as in this video and this video) and then asking Mechanical Turk workers of both ethnicities to rate the quality of the interactions and to guess the likely native language of humans who were the prototypes for these behaviors.

Testing effects of body language requires a body, which may influence the user in complex ways, making it hard to separate effects of the behaviors from the effects of the appearance. To control for the effects of the appearance, we tested every behavior rendered on four faces. Two of the faces were selected from the 18 faces shown below as most readily attributable to native speakers of American English or Arabic, while two other faces were less human-like and had low ethnic attribution.

18 faces defined by 3 skin tones, three hair colors and two eye colors. Artist: Richard Colburn.
Every verbal and non-verbal behavior was rendered on each of these four faces. Artist: Richard Colburn.

The behavior pairs that were eventually selected as most promising cues of the two ethnicities were implemented on Hala and then tested with native speakers of American English and native speakers of Arabic. The results showed that users tended to recognize the robot’s intended ethnicity via its behaviors. That, however, did not have a significant influence on the quality of the interactions.

Should character creators go into such lengths in order to design non-verbal behaviors?

Expressive movement

Whether intended as such or not, users may interpret a movement as a social signal. Consider these two sumo-fighting robots. Presumably, their movement is driven by their function, yet we (and those kids in the audience) cannot help but have an emotional response to the physical comedy shown by the robot double act.

Expressive form

Compare the image of Jibo on the right with its imprecise rendering on the left. While head postures in both images are approximately the same, the overall impressions these two shapes produce are starkly different.

So close yet so far. Left image via TIME.

Expressive morphology

Anton Chekhov used to say: “One must never place a loaded rifle on the stage if it isn’t going to go off.” The same can be said of a social robot’s morphology, such as its hands. If present, they are expected to provide similar social affordances as human hands. Here is an example when a failure to respect this principle leads to a social gaffe.

Balance of expression and perception

Unlike actors on stage, robots interacting with users do not presume to have the fourth wall. If a robot talks, a user may interject or respond, quite naturally assuming that the robot can hear and understand. If a robot is capable of gazing at the user’s face, the robot is expected to read the user’s gaze and to know when to break an eye contact.

As it happens, typically, rendering behaviors is much easier technically, than recognizing them. As a result, the rule of balance of expression and perception is frequently broken in social robotics.

Summary

Social robots are already superior to humans in some scenarios and can potentially reach or exceed human performance in others. Non-anthropomorphic robots and pets share advantages that can make them easier to interact with (better yet, robots don’t poop). Anthropomorphic robots, on the other hand, require a more holistic character design in order to engage users. Performing arts, like theater, can provide guidelines for the development of engaging believable characters through backstory, behavior, and appearance. Unlike the audience watching a stage performance, however, users will attempt to interact with the robot based on their assumptions about its affordances communicated to them through the robot’s character.

Part II: From engagement to relationship.

--

--

Maxim Makatchev

Founder of susuROBO. Talking machines: contributed to roboceptionists Tank and culture-aware Hala, trash-talking scrabble gamebot Victor, Jibo, and Volley.