Making Research Movies in python

Mon, Aug 6, 2018

Sometimes, you need more than a picture to say 1000 words. A few hundred pictures, all stuck together in a movie, might give you all of the information that you need, though. However, making movies in everyone’s favourite language (clearly python) is quite obtuse, at least when you try to use the de-facto plotting library matplotlib.

There are a bunch of ways that you can go about making a little movie of your results. An example of such a movie is shown below, which shows a Kelvin-Helmholtz instability (a density slice) evolving through time. This was made using FuncAnimation in matplotlib, which we’ll discuss later.

The simple method: generating individual frames

The most simple way to make a movie is to take your regular plot-generating script and run it n times to get n frames using slightly different data each time. That will write n .png images, which you can then stitch together using a utility like ffmpeg.

For example, let’s look at a sine wave:

import numpy as np
import matplotlib.pyplot as plt
import sys

# Grab the frame number from python3 easy_mode.py <x>
frame_number = int(sys.argv[1])

# Some global variables to define the whole run
total_number_of_frames = 100
total_width_of_sine_wave = 2 * np.pi

# How far through are we?
current_factor = frame_number / total_number_of_frames

current_x_data = np.linspace(
    0,
    total_width_of_sine_wave * current_factor,
    frame_number
)
current_y_data = np.sin(current_x_data)

# Now we can do the plotting!
plt.plot(current_x_data, current_y_data)

# Have to set these otherwise we will get one ugly plot!
plt.xlim(0, total_width_of_sine_wave)
plt.ylim(-1.2, 1.2)

plt.xlabel("$x$")
plt.ylabel("$\sin(x)$")

# Make me pretty
plt.tight_layout()
plt.savefig("image_{:03d}.png".format(frame_number))

Then, we can make all 100 frames by running a little bash for loop,

for image in {0..100}
do
  python3 easy_mode.py $image
done

and stitch them together in a movie using a (somewhat complicated) ffmpeg command,

ffmpeg -i image_%03d.png -c:v libx264 -vf fps=25 -pix_fmt yuv420p out_easy_mode.mp4

This ends up looking fine, but takes ages. There are a number of reasons for this:

We are re-launching python for every frame, which takes quite a while in itself,
We are re-importing all of the libraries we need every time, as well as generating their internal data structures (such as the matplotlib axis),
We are re-generating data for every single frame, rather than re-using in-memory data.

This is a very convenient way of making a movie, especially if you already have a script. But when it is easy to make the data ahead of time, and all you want to do is plot a sub-set of them, then it is a huge waste of resources. This is especially prominent in science, where even loading the data may take a significant amount of time. To make this video, including the ffmpeg stitching, it took 74 seconds on my base-model 2017 MacBook Pro.

Using the dreaded `FuncAnimation`

We can simplify the above massively, at least for this simple case, by using the built-in matplotlib.animation API. We’ll no longer need to do that bash for loop, and the script itself will not have to re-generate all of the data!

There are two ways that you can access animations in matplotlib. The first is through FuncAnimation, where you supply a function that updates a given axes object to generate each frame, and ArtistAnimation, which takes a list of matplotlib artist objects.

For the sine wave example above, it should be fairly simple to figure out how to write a function that selects a sub-set of the data for plotting at a given frame n. The code that does this below is fairly simple. All we need to do is define a function, called in this case animate, which updates the line plot that we have for each frame. We can pre-generate the data and just select sub-sets of it in this case.

The hard part in this case is finding out how to update the line object. Each type of matplotlib object behaves in a slightly different way, but they usually have a set_<something> method that you can use to do this kind of updating.

import numpy as np
import matplotlib.pyplot as plt

from matplotlib.animation import FuncAnimation

# Some global variables to define the whole run
total_number_of_frames = 100
total_width_of_sine_wave = 2 * np.pi
all_x_data = np.linspace(0, total_width_of_sine_wave, total_number_of_frames)
all_y_data = np.sin(all_x_data)


def animate(frame, line):
    """
    Animation function. Takes the current frame number (to select the potion of
    data to plot) and a line object to update.
    """

    # Not strictly neccessary, just so we know we are stealing these from
    # the global scope
    global all_x_data, all_y_data

    # We want up-to and _including_ the frame'th element
    current_x_data = all_x_data[: frame + 1]
    current_y_data = all_y_data[: frame + 1]

    line.set_xdata(current_x_data)
    line.set_ydata(current_y_data)

    # This comma is necessary!
    return (line,)


# Now we can do the plotting!
fig, ax = plt.subplots(1)
# Initialise our line
line, = ax.plot([0], [0])

# Have to set these otherwise we will get one ugly plot!
ax.set_xlim(0, total_width_of_sine_wave)
ax.set_ylim(-1.2, 1.2)

ax.set_xlabel("$x$")
ax.set_ylabel("$\sin(x)$")

# Make me pretty
fig.tight_layout()

animation = FuncAnimation(
    # Your Matplotlib Figure object
    fig,
    # The function that does the updating of the Figure
    animate,
    # Frame information (here just frame number)
    np.arange(total_number_of_frames),
    # Extra arguments to the animate function
    fargs=[line],
    # Frame-time in ms; i.e. for a given frame-rate x, 1000/x
    interval=1000 / 25
)
animation.save("out_funcanimation.mp4")

Thankfully, it looks exactly the same!

To make this movie using the FuncAnimation method, it took 2.16 seconds on my base- model 2017 (13 inch) MacBook Pro. That’s a speed-up of around 30x just by not launching the python interpreter each time! We also didn’t need any kind of clumsy bash for-loop to make the frames, and didn’t need to hand-call ffmpeg, that was all handled for us.

2D Grid Movie

A very common use of a movie is to show the time-evolution of a visualistion - like the movie of the Kelvin-Helmholtz instability that I showed earlier. This is a little more difficult than it might first seem, because of the way that the resulting object from imshow or pcolormesh behaves. Unfortunately, those both return different things, so we’ll focus on imshow in this specific example as it seems the most popular.

The set-up is very similar. Just write an animate function that loads (or finds in memory) the correct data, and have it play with a set_<x> method. You will also want to do some playing around with the figure properties to make sure that you are only plotting on the right pixels. To do that:

Set the figsize parameter to (1, 1)
You can then set the number of pixels in your output video by setting the dpi parameter in the save method on your FuncAnimation object
Set this to the same number of pixels as in your data (to avoid smoothing)
Use adjust_subplots on your figure to remove any bounding whitespace
Finally, use .axis("off") on your axis object to remove the thin black line that normally bounds the plot

Don’t forget to manually set vmin and vmax on your colour map, or map your data yourself. Otherwise, those will be set based on the initial frame and may get completely washed out.

Here’s a script that does that with some random data:

import numpy as np
import matplotlib.pyplot as plt

from matplotlib.animation import FuncAnimation

# Some global variables to define the whole run
total_number_of_frames = 100
all_data = [
    np.random.rand(512, 512) for x in range(100)
]


def animate(frame):
    """
    Animation function. Takes the current frame number (to select the potion of
    data to plot) and a line object to update.
    """

    # Not strictly neccessary, just so we know we are stealing these from
    # the global scope
    global all_data, image

    # We want up-to and _including_ the frame'th element
    image.set_array(all_data[frame])

    return image


# Now we can do the plotting!
fig, ax = plt.subplots(1, figsize=(1, 1))
# Remove a bunch of stuff to make sure we only 'see' the actual imshow
# Stretch to fit the whole plane
fig.subplots_adjust(0, 0, 1, 1)
# Remove bounding line
ax.axis("off")

# Initialise our plot. Make sure you set vmin and vmax!
image = ax.imshow(all_data[0], vmin=0, vmax=1)

animation = FuncAnimation(
    # Your Matplotlib Figure object
    fig,
    # The function that does the updating of the Figure
    animate,
    # Frame information (here just frame number)
    np.arange(total_number_of_frames),
    # Extra arguments to the animate function
    fargs=[],
    # Frame-time in ms; i.e. for a given frame-rate x, 1000/x
    interval=1000 / 25
)

# Try to set the DPI to the actual number of pixels you're plotting
animation.save("out_2dgrid.mp4", dpi=512)

Here’s the movie that produces!

Sticking it all together (with `ffmpeg`)

Sometimes you have more than one movie that you would like to show side-by-side. There are a bunch of ways you can do this (even by just using two image objects in a FuncAnimation), but it’s often nice to do things outside of python, as you may want to make different combinations. I would recommend making a single “movie” with each python script, and then using the following ffmpeg filter to stick them together.

Say you have two videos, one called x.mp4, and the other called y.mp4. You want to display them side-by-side. ffmpeg makes this very easy through it’s complex filter function, so all you have to do is write

ffmpeg -i x.mp4 -i y.mp4 -filter_complex hstack out.mp4

That was easy! Things get a little more complicated when you want more videos in your layout, though. Then you’ll have to learn a little bit of ffmpeg magic.

In this case, we have four videos a.mp4, b.mp4, c.mp4, and d.mp4. We want to stack them in a 2x2 grid. To do this, we have two options. We could run ffmpeg three times - twice with hstack to put two of them together, and once with vstack to stack those 2x1’s on top of each other. Alternatively, we can use variables within ffmpeg itself. Here’s the command to do that - we’ll break this down afterwards:

ffmpeg -i a.mp4 -i b.mp4 -i c.mp4 -i d.mp4 -filter_complex "[0:v][1:v]hstack[top];[2:v][3:v]hstack[bottom];[top][bottom]vstack[out]" -map "[out]" out.mp4

There are a few things here that aren’t immediately obvious:

[i:v] references the -i <video> input that was given (in that order, such that here [0:v] corresponds to a.mp4, [2:v] corresponds to c.mp4 as a kind of variable
To apply a filter, it requires the input ‘variables’ on the left, and the output variables on the right, [<input 1>][<input 2>]<filter>[<output>]
The semicolon in the filter_complex string separates individual filters
We need to map [out] to the global scope to let ffmpeg know that’s the variable we want to write to file

Here’s what that looks like when we stick four movies with different colourmaps together (bonus points for those who know the names of all of these, and which one you should never use):

Summary

You can stick together a bunch of frames generated from a script with ffmpeg, but it’s super slow!
matplotlib’s FuncAnimation is a bit weird, but ultimately very helpful
ffmpeg can stick your movies together for you, allowing you to not have to worry about re-generating frames quite as often

And with that, I’ll let you go forth and make some fantastic movies!

Josh Borrow

Contact

Making Research Movies in python

The simple method: generating individual frames

Using the dreaded `FuncAnimation`

2D Grid Movie

Sticking it all together (with `ffmpeg`)

Summary

Josh Borrow

Contact

Making Research Movies in python

The simple method: generating individual frames

Using the dreaded FuncAnimation

2D Grid Movie

Sticking it all together (with ffmpeg)

Summary

Using the dreaded `FuncAnimation`

Sticking it all together (with `ffmpeg`)