Making Research Movies in python
Mon, Aug 6, 2018Sometimes, you need more than a picture to say 1000 words. A few hundred pictures, all
stuck together in a movie, might give you all of the information that you need,
though. However, making movies in everyone’s favourite language (clearly python
) is
quite obtuse, at least when you try to use the de-facto plotting library
matplotlib.
There are a bunch of ways that you can go about making a little movie of your results.
An example of such a movie is shown below, which shows a Kelvin-Helmholtz instability
(a density slice) evolving through time. This was made using FuncAnimation
in
matplotlib, which we’ll discuss later.
The simple method: generating individual frames
The most simple way to make a movie is to take your regular plot-generating script
and run it n times to get n frames using slightly different data each time. That
will write n .png
images, which you can then stitch together using a utility
like ffmpeg
.
For example, let’s look at a sine wave:
import numpy as np
import matplotlib.pyplot as plt
import sys
# Grab the frame number from python3 easy_mode.py <x>
frame_number = int(sys.argv[1])
# Some global variables to define the whole run
total_number_of_frames = 100
total_width_of_sine_wave = 2 * np.pi
# How far through are we?
current_factor = frame_number / total_number_of_frames
current_x_data = np.linspace(
0,
total_width_of_sine_wave * current_factor,
frame_number
)
current_y_data = np.sin(current_x_data)
# Now we can do the plotting!
plt.plot(current_x_data, current_y_data)
# Have to set these otherwise we will get one ugly plot!
plt.xlim(0, total_width_of_sine_wave)
plt.ylim(-1.2, 1.2)
plt.xlabel("$x$")
plt.ylabel("$\sin(x)$")
# Make me pretty
plt.tight_layout()
plt.savefig("image_{:03d}.png".format(frame_number))
Then, we can make all 100 frames by running a little bash for loop,
for image in {0..100}
do
python3 easy_mode.py $image
done
and stitch them together in a movie using a (somewhat complicated) ffmpeg
command,
ffmpeg -i image_%03d.png -c:v libx264 -vf fps=25 -pix_fmt yuv420p out_easy_mode.mp4
This ends up looking fine, but takes ages. There are a number of reasons for this:
- We are re-launching python for every frame, which takes quite a while in itself,
- We are re-importing all of the libraries we need every time, as well as generating their internal data structures (such as the matplotlib axis),
- We are re-generating data for every single frame, rather than re-using in-memory data.
This is a very convenient way of making a movie, especially if you already have a
script. But when it is easy to make the data ahead of time, and all you want to do is
plot a sub-set of them, then it is a huge waste of resources. This is especially
prominent in science, where even loading the data may take a significant amount of
time. To make this video, including the ffmpeg
stitching, it took 74 seconds on my
base-model 2017 MacBook Pro.
Using the dreaded FuncAnimation
We can simplify the above massively, at least for this simple case, by using the
built-in matplotlib.animation
API. We’ll no longer need to do that bash for loop,
and the script itself will not have to re-generate all of the data!
There are two ways that you can access animations in matplotlib
. The first is through
FuncAnimation
, where you supply a function that updates a given axes
object to
generate each frame, and ArtistAnimation
, which takes a list of matplotlib
artist
objects.
For the sine wave example above, it should be fairly simple to figure out how to write
a function that selects a sub-set of the data for plotting at a given frame n. The
code that does this below is fairly simple. All we need to do is define a function,
called in this case animate
, which updates the line plot that we have for each frame.
We can pre-generate the data and just select sub-sets of it in this case.
The hard part in this case is finding out how to update the line object. Each type
of matplotlib
object behaves in a slightly different way, but they usually have a
set_<something>
method that you can use to do this kind of updating.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
# Some global variables to define the whole run
total_number_of_frames = 100
total_width_of_sine_wave = 2 * np.pi
all_x_data = np.linspace(0, total_width_of_sine_wave, total_number_of_frames)
all_y_data = np.sin(all_x_data)
def animate(frame, line):
"""
Animation function. Takes the current frame number (to select the potion of
data to plot) and a line object to update.
"""
# Not strictly neccessary, just so we know we are stealing these from
# the global scope
global all_x_data, all_y_data
# We want up-to and _including_ the frame'th element
current_x_data = all_x_data[: frame + 1]
current_y_data = all_y_data[: frame + 1]
line.set_xdata(current_x_data)
line.set_ydata(current_y_data)
# This comma is necessary!
return (line,)
# Now we can do the plotting!
fig, ax = plt.subplots(1)
# Initialise our line
line, = ax.plot([0], [0])
# Have to set these otherwise we will get one ugly plot!
ax.set_xlim(0, total_width_of_sine_wave)
ax.set_ylim(-1.2, 1.2)
ax.set_xlabel("$x$")
ax.set_ylabel("$\sin(x)$")
# Make me pretty
fig.tight_layout()
animation = FuncAnimation(
# Your Matplotlib Figure object
fig,
# The function that does the updating of the Figure
animate,
# Frame information (here just frame number)
np.arange(total_number_of_frames),
# Extra arguments to the animate function
fargs=[line],
# Frame-time in ms; i.e. for a given frame-rate x, 1000/x
interval=1000 / 25
)
animation.save("out_funcanimation.mp4")
Thankfully, it looks exactly the same!
To make this movie using the FuncAnimation
method, it took 2.16 seconds on my base-
model 2017 (13 inch) MacBook Pro. That’s a speed-up of around 30x just by not launching
the python interpreter each time! We also didn’t need any kind of clumsy bash for-loop
to make the frames, and didn’t need to hand-call ffmpeg
, that was all handled for
us.
2D Grid Movie
A very common use of a movie is to show the time-evolution of a visualistion - like the
movie of the Kelvin-Helmholtz instability that I showed earlier. This is a little more
difficult than it might first seem, because of the way that the resulting object from
imshow
or pcolormesh
behaves. Unfortunately, those both return different things,
so we’ll focus on imshow
in this specific example as it seems the most popular.
The set-up is very similar. Just write an animate
function that loads (or finds in
memory) the correct data, and have it play with a set_<x>
method. You will also want
to do some playing around with the figure properties to make sure that you are only
plotting on the right pixels. To do that:
- Set the
figsize
parameter to(1, 1)
- You can then set the number of pixels in your output video by setting the
dpi
parameter in thesave
method on yourFuncAnimation
object - Set this to the same number of pixels as in your data (to avoid smoothing)
- Use
adjust_subplots
on your figure to remove any bounding whitespace - Finally, use
.axis("off")
on your axis object to remove the thin black line that normally bounds the plot
Don’t forget to manually set vmin
and vmax
on your colour map, or map your data
yourself. Otherwise, those will be set based on the initial frame and may get
completely washed out.
Here’s a script that does that with some random data:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
# Some global variables to define the whole run
total_number_of_frames = 100
all_data = [
np.random.rand(512, 512) for x in range(100)
]
def animate(frame):
"""
Animation function. Takes the current frame number (to select the potion of
data to plot) and a line object to update.
"""
# Not strictly neccessary, just so we know we are stealing these from
# the global scope
global all_data, image
# We want up-to and _including_ the frame'th element
image.set_array(all_data[frame])
return image
# Now we can do the plotting!
fig, ax = plt.subplots(1, figsize=(1, 1))
# Remove a bunch of stuff to make sure we only 'see' the actual imshow
# Stretch to fit the whole plane
fig.subplots_adjust(0, 0, 1, 1)
# Remove bounding line
ax.axis("off")
# Initialise our plot. Make sure you set vmin and vmax!
image = ax.imshow(all_data[0], vmin=0, vmax=1)
animation = FuncAnimation(
# Your Matplotlib Figure object
fig,
# The function that does the updating of the Figure
animate,
# Frame information (here just frame number)
np.arange(total_number_of_frames),
# Extra arguments to the animate function
fargs=[],
# Frame-time in ms; i.e. for a given frame-rate x, 1000/x
interval=1000 / 25
)
# Try to set the DPI to the actual number of pixels you're plotting
animation.save("out_2dgrid.mp4", dpi=512)
Here’s the movie that produces!
Sticking it all together (with ffmpeg
)
Sometimes you have more than one movie that you would like to show side-by-side. There
are a bunch of ways you can do this (even by just using two image
objects in a
FuncAnimation
), but it’s often nice to do things outside of python, as you may
want to make different combinations. I would recommend making a single “movie” with
each python script, and then using the following ffmpeg filter to stick them
together.
Say you have two videos, one called x.mp4
, and the other called y.mp4
. You want to
display them side-by-side. ffmpeg
makes this very easy through it’s complex filter
function, so all you have to do is write
ffmpeg -i x.mp4 -i y.mp4 -filter_complex hstack out.mp4
That was easy! Things get a little more complicated when you want more videos in your
layout, though. Then you’ll have to learn a little bit of ffmpeg
magic.
In this case, we have four videos a.mp4
, b.mp4
, c.mp4
, and d.mp4
. We want to
stack them in a 2x2 grid. To do this, we have two options. We could run ffmpeg
three
times - twice with hstack
to put two of them together, and once with vstack
to
stack those 2x1’s on top of each other. Alternatively, we can use variables within
ffmpeg
itself. Here’s the command to do that - we’ll break this down afterwards:
ffmpeg -i a.mp4 -i b.mp4 -i c.mp4 -i d.mp4 -filter_complex "[0:v][1:v]hstack[top];[2:v][3:v]hstack[bottom];[top][bottom]vstack[out]" -map "[out]" out.mp4
There are a few things here that aren’t immediately obvious:
[i:v]
references the-i <video>
input that was given (in that order, such that here[0:v]
corresponds toa.mp4
,[2:v]
corresponds toc.mp4
as a kind of variable- To apply a filter, it requires the input ‘variables’ on the left, and the output
variables on the right,
[<input 1>][<input 2>]<filter>[<output>]
- The semicolon in the
filter_complex
string separates individual filters - We need to map
[out]
to the global scope to letffmpeg
know that’s the variable we want to write to file
Here’s what that looks like when we stick four movies with different colourmaps together (bonus points for those who know the names of all of these, and which one you should never use):
Summary
- You can stick together a bunch of frames generated from a script with
ffmpeg
, but it’s super slow! matplotlib
’sFuncAnimation
is a bit weird, but ultimately very helpfulffmpeg
can stick your movies together for you, allowing you to not have to worry about re-generating frames quite as often
And with that, I’ll let you go forth and make some fantastic movies!