In a previous post, we discussed how using audio and visual elements strategically in learning design can be tremendously impactful. Multimedia instruction is not new, and for many decades, learning designers around the world have been using multimedia to create dynamic and engaging learning experiences. But there’s more to it than making something visually interesting. When we understand how the human mind processes multimedia, we can better leverage it to foster deeper learning.

In this article, Part 1 of our 4-part series, we’re going to revisit educational psychologist Richard Mayer’s foundational research in multimedia learning. In Parts 2, 3 and 4, we’ll discuss the newly expanded Principles for Multimedia Learning and explore new applications and considerations for these research-backed best practices in multimedia learning.

Illustraton of online learning on computer

Why Do We Use Multimedia Instruction?

Mayer defines and justifies the case for multimedia instruction in his book Multimedia Learning:

“Multimedia instruction refers to the presentation of material using both words and pictures, with the intention of promoting learning. The case for multimedia learning rests in the premise that learners can better understand an explanation when it is presented in words and pictures than when it is presented in words alone.” (Mayer, 2020, p. 3)

Consider the following learning experiences, all of which qualify as multimedia instruction, since they use a combination of words (spoken or written) and pictures to promote learning.

  • An aspiring bartender participates in a self-paced, online mixology course. The course contains interactive diagrams of various alcoholic beverages, showing their ingredients and alcohol content.
  • Students in an archaeology class attend a lecture about stone tools. The professor shares an image of a stone tool over the projector and describes its features to the class.
  • An amateur knitter watches a narrated YouTube tutorial about how to knit a winter hat. The video pauses at important moments, zooms in on certain yarn patterns, and highlights key areas with arrows and circles.
  • A student in a microbiology class studies a labeled diagram of a cell in their course textbook.

Illustration of a smart phone with different kinds of media being called outAccording to Mayer, people can generally better understand an explanation when it is presented in words and pictures than when it is presented in words alone. When it comes to understanding instructional messages, people possess two modes for processing information—one for verbal material (spoken or written words) and one for pictorial material (images). Presenting material in both words and pictures takes advantage of both modes and uses peoples’ full capacity for processing information. Moreover, words and pictures are not equivalent; when used in conjunction, they complement one another and fill in one another’s gaps. And when people build connections between words and pictures, they learn more deeply.

The Cognitive Theory of Multimedia Learning

Three assumptions about the human mind underpin Mayer’s theory, and it’s important to be familiar with them when designing multimedia instruction.

  1. People possess separate channels for processing visual and auditory information. This is called the Dual Channel Assumption. The visual channel handles information presented to the eyes, such as photographs, illustrations, live video, animations, and written text. The auditory channel handles information presented to the ears, such as spoken words and non-verbal sounds like music and sound effects.
  2. People hold only a few words and a few images in each channel and in working memory at a given time. This is called the Limited Capacity Assumption. Working memory is “the ability to hold and manipulate information in the mind over brief intervals,” and it plays a key role in learning (Burmester, 2017 and Cowan, 2013). Because working memory is limited, the mind constantly makes unconscious decisions about what to pay attention to and how to build connections between the chosen information and our prior knowledge.
  3. People engage in active learning by selecting relevant incoming information, organizing it into coherent mental models, and integrating models with one another and with relevant prior knowledge. This is the Active-Processing Assumption. Essentially, the mind is not like a tape recorder passively absorbing information that can later be recalled verbatim. It actively constructs mental models and makes meaning of information.

Diagram of cognitive theory of multimedia learning described in article

Managing Demands on Cognitive Capacity During Multimedia Learning

We’ve established that the human mind is a dual-channel, limited-capacity, active-processing system that actively seeks to create mental models. Good multimedia instruction takes these features of the human mind into account, guiding the learner’s cognitive processing without overloading working memory.

Illustration of man in center with a mind map showing different icons around his headWhen participating in multimedia learning, learners experience three distinct processing demands on their cognitive capacity—some beneficial for learning, some not.

  1. Cognitive processing that does not serve the instructional goal is called extraneous processing. Clunky and confusing design choices create extraneous processing.
  2. Cognitive processing required for the learner to store essential material in working memory is called essential processing. The complexity of the material for the learner dictates essential processing.
  3. Cognitive processing devoted to generating coherent mental structures that connect with one another and with the learner’s prior knowledge is called generative processing. In other words, it is the process of forming a deep understanding of the subject matter. The learner must be motivated to engage in generative processing.

The three demands on cognitive processing present specific challenges for learners and specific opportunities to improve learning design. Consider the following learning experiences:

Extraneous Processing

Example Problem:
An online course for managers displays a screenshot of an HR software’s homepage. A paragraph of text describing the software features is placed to the left of the screenshot. The learner reads the text and then tries to locate the described features in the screenshot. They waste precious cognitive processing capacity looking back and forth between the text and the image to figure out where the items are located in the software. By engaging in this extraneous processing, they are not able to engage in other beneficial cognitive processes for learning.

Example Solution:
The screen is redesigned as an interactive labeled graphic. When learners click on various parts of the software screenshot, labels appear containing brief explanations of the features. The learner does not waste cognitive processing locating the items onscreen, and they have more capacity available for the course content.

Essential Processing

Example Problem:
A new homeowner watches a fast-paced, 5-minute-long YouTube video about how to tile a shower. The instructor dives right into the demonstration without first describing the tools needed for the job. The fast pace, unfamiliar terms, and unfamiliar images overwhelm the learner, and they are not able to hold the material in their working memory.

Example Solution:
The video is redesigned to begin with a short description and picture of each of the tools needed and to provide a short summary of the six steps for tiling. The video is then divided into six bookmarked chapters, each corresponding to a step. With knowledge of the tools needed and the ability to revisit chapters, the learner does not feel overwhelmed when watching the video and is more easily able to hold the content in working memory.

Generative Processing

Example Problem:
A human biology e-learning course contains a narrated animation that describes the similarities and differences between meiosis and mitosis. After viewing the animation, the learner moves on to the next video in the course, which is about chromosomal abnormalities. The course does not prompt the learner to actively engage with the material about meiosis and mitosis. The learner is therefore not motivated to do so, and the learner soon forgets the material.

Example Solution:
The course is redesigned to include an activity after the meiosis/mitosis animation. The course presents the learner with an empty Venn diagram, and they must sort a list of features into the correct areas. The activity motivates the learner to consider the similarities and differences between meiosis and mitosis, and by completing the activity, they form a mental structure about the two processes and their roles in cell division.

In each of these examples, adjustments in the design helped reduce unhelpful cognitive processing and encourage beneficial processing.

Diagram - Reduce -Manage - FosterIn general, good multimedia design reduces extraneous processing, manages essential processing, and fosters generative processing, and Mayer’s Principles for Multimedia Learning—recently expanded and updated in 2020—are research-backed strategies for doing just that.

The next three installments in this blog series will address the principles and explore how recent trends and approaches can optimize the learning experience to take advantage of learners’ full capacity for processing learning.


Burmester, A. (2017, June 5). Working Memory: How You Keep Things “In Mind” Over the Short Term. Scientific American.

Cowan, N. (2013). Working Memory Underpins Cognitive Development, Learning, and Education. Educational Psychology Review, 26 (2), pp. 197-223.

Mayer, R.E. (2020). Multimedia Learning (3rd ed.). Cambridge University Press.

Nora Murphy Silverman is a Senior eLearning Designer/Developer at Illumina Interactive