Abstract

In the context of artistic performances, the complexity and diversity of digital interfaces may impair the spectator experience, in particular hiding the engagement and virtuosity of the performers.

Artists and researchers have made attempts at solving this by augmenting performances with additional information provided through visual, haptic or sonic modalities.

However, the proposed techniques have not yet been formalized and we believe a clarification of their many aspects is necessary for future research.

In this paper, we propose a taxonomy for what we define as Spectator Experience Augmentation Techniques (SEATs).

We use it to analyse existing techniques and we demonstrate how it can serve as a basis for the exploration of novel ones.

Dimensions

Spatial Alignment

This dimension describes how augmentations are spatially aligned with the performance components. Possible values include : aligned, when the augmentations are perceived as coming from the component itself; co-located when they can be perceived while maintaining the focus on the performance; distant when there is a shift in focus required to access them. For instance, visual augmentations of a music performance can be aligned if using a Pepper's ghost augmented-reality display, co-located if projected on a screen behind the musician, or distant if displayed on the spectators' mobile devices. While aligning augmentations seem ideal because it preserves the focus on the performance, the presentation of complex information might cause sensory or cognitive overload for the audience, in which case a more distant presentation is preferable.

Temporal Alignment

This dimension pertains to when spectators can access to the augmentations: before, during or after the performance. For instance, in the case of verbal explanations, they may be provided as a pre-performance demo, as comments during the performance, or as a post-performance discussion. If presented before the performance, augmentations may provide cues for the audience without any alteration to the performance. However if they are too complex or given too long before, some of the information may not be remembered when needed. If presented during, the information is given when needed but it might distract or overwhelm the audience. If presented after, the information might be lacking to understand events when they happen, but the magical aspect reeves2005designing of the performance is preserved.

Temporal Density

This dimension relates to the temporal range that the augmentations cover. It can be broadly defined as a low, medium and high. For example, augmentations of a music performance can range from displaying only the last played note to displaying the whole score. While a high density will help the audience forge a stronger sense of performer's intentions or actions over time, and may help them in perceiving virtuosity or errors, providing too much information might also distract them from the current actions.

Temporal Control

This dimension describes the possibility for the audience to control the temporal range covered by the augmentations, i.e. on which part of the performance they will access information. It ranges from none to full. An example would be textual explanations provided one by one either synchronously with events of the performance or on request, with spectators being able to freely scroll through them. From a design perspective, allowing for temporal control may help spectators build a better understanding of the performance, but it might also lead them to miss augmentations that should be perceived at a specific time.

Semantic Density

This dimension relates to how much information is provided by a SEAT. It can also be seen as the information level-of-detail. Like temporal density, it can be broadly classified low, medium and high. For instance, augmentations on a digital musical instrument may range from only showing which synthesizer is being played, to showing the detailed audio graph including the activity of all synthesis parameters. This dimension is essential as one needs to ensure that the density is sufficient to provide useful information but not too high so that it does not result in sensory or cognitive overload for the audience.

Semantic Control

Symmetrically to temporal control, this dimension indicates if the level of semantic detail can be chosen by the audience, allowing for a personalized access to narrative elements benford2018designing or for information that matches the expertise of the spectator capra2020all. It ranges from none to full. In the case of textual explanations, it can for example go from a single level of detail given to all spectators to multiple versions targeted at different levels of expertise, e.g. children / adults or novices / experts. This level may also be automatically selected according to emotional or cognitive states measured through wearable physiological sensors capra2017toward. Regarding the accessibility of the augmentations, increasing the control helps avoiding too simple or too complex information depending on spectator's expertise, but it also increases the complexity of implementation for the performance designers. One also needs to ensure that the control interface is not too difficult or that it does not distract spectators from the performance.

Presentation nature

This dimension relates to the form given to the information provided to the audience. Possible values include figurative, abstract, conceptual, and linguistic. For instance, to indicate that a certain key was played by a musician the augmentation can display respectively a close-up of their hands and the keyboard, a colour changing shape, the note on a scale or the note name. This dimension involves a trade-off between the information explicitness, maximised with linguistic or figurative augmentations, and its compactness which can be optimised with abstract or conceptual augmentations showing only essential information. It also has implications on the aesthetic integration of the augmentations in the performance.

Presentation modality

This dimension describes how the augmentations are displayed. The modality can be visual, auditory and haptic. For example, the subtle gestures of a performer on a sensor can be amplified using changes in a visual shape that represents the sensor or through vibrotactile feedback reproducing the gestures. The choice of modality depends on the type of performance and its scalability. While visual and auditory displays can be generalised, haptic ones imply more restrictions as individual devices need to be designed.

Content nature

This dimension pertains to the nature of the content displayed by the augmentations, i.e. on which aspects of the performance the SEAT provides information. We identified four possible aspects: technical to reveal the mechanisms of the interface berthaut2013rouages; gestural, to amplify subtle/hidden movements operrotin:2014; intentional, so that the audience understand what performers are trying to accomplish fyans2009spectator; causal, so that spectators have a clear perception of who from the performer or autonomous processes is responsible for variations in the sound. The choice of content nature depends strongly on the aim of the augmentations. Showing the intention can highlight the performer's virtuosity or affect the audience emotional response, while technical augmentations may increase the level of comprehension of the audience.

Agents

This dimension describes the representation a SEAT can give of the agents in a performance and of their interactions. Agents are entities that can interact with instruments and interact together, like 3 musicians on stage or a composite crew including virtual agents. We envision 4 values for this dimension. Origin, so that spectators perceive who is the source of events in the performance, for example amongst members of an orchestra operrotin:2014 or between the performer and automated processes. Avatar indicates the use of representations of the agents to provide additional information. This modality can be used in performances where the musicians are not physically located in the same place or when an agent is virtual. Communication when musicians exchange information not directly linked to sound production, such as synchronisation or support signals. Interactions indicates when the SEAT augments the interactions between agents. This can be useful for instruments with shared controls and rich musician-to-musician interfaces such as bf-pd dahl2017bf.

Content reactivity

This dimension describes the relation between the augmentations and the performance. It can be fixed when the augmentations are pre-defined and do not change with the performance, semi-fixed when some elements of the augmentations change according to the performance, and reactive when augmentations are generated from information extracted in realtime during the performance. For instance, visual augmentations can be pre-recorded videos provided at pre-defined moments or synthetic graphics triggered and adapted dynamically based on the performers actions. While a reactive content guarantees that the augmentations will adapt to changes in the performances, it involves a technical complexity not always achievable. For example performers intentions can not be extracted dynamically during the performance. It may also lead to information being provided in a less accessible way than fixed content if it is not designed carefully, e.g. with multiple visual indications overlapping.

Detail of some SEATs