I use a GoPro Max 360-degree camera in my annual #StillStanding project. That means that I also have had an excellent chance to work with GoPro files and try to understand their inner logic. In this blog post, I will summarize some of my findings.
What is recorded?
Recording “a video” with a GoPro Max results in recording multiple files. For example, each of my daily 10-minute recordings ends up with something like this:
There are two .THM files (thumbnail images), two .LRV files (low-resolution video files) and two .360 files (high-resolution video files. There are two of each because GoPro has a cap of 4.0 GB per file. After all, it relies on FAT32 formatting on the memory card. This corresponds to approximately 8 minutes when recording in the .360 file format. For convenience (and consistency?), the .LRV files are split simultaneously, and a new thumbnail is created.
Since the GoPro Max is an action camera, writing new files once in a while is not a bad idea. For long recordings, this ensures that at least parts of the recording are “safe” in case the camera breaks, the battery dies, or similar.
For the end user having multiple files is annoying. That is probably why the files are automatically merged when uploading to GoPro’s online media library. However, the end user must handle the merging when copying the files straight from the memory card. Not a big deal and I will show how it is done below.
The .THM files
The .THM files are thumbnail images that GoPro probably uses for internal preview functions. It contains two 180-degree “fisheye” images of the scene:
Looking at the content of a file shows that this is only a JPEG file with a relatively low resolution:
The file can be deleted, but I prefer to keep them around as a quick fall-back solution in case I want to display the content of a file somewhere.
The .LRV files
The low-resolution video files (.LRV) generated by the GoPro Max are also there primarily for internal camera and app usage. It is also coded as a dual 180-degree file and can be played in VLC:
Inspecting the content of this file shows that it is encoded with the H.264 codec for the video stream and AAC for the audio. My Ubuntu system suggests that it uses a Quicktime container, although I think it is a relatively normal MP4 file for most practical purposes.
The video stream uses an odd format (1408x704 pixels), while the audio uses a standard stereo 48 kHz setting.
The small pixel size and low bit rates for both audio and video make the .LRV files relatively small; in fact, only around 4% of the .360 files.
Still, the quality of the files is pretty good, so I have ended up using them as the source material for my daily #StillStanding Mastodon updates
The compressed files
To complicate things further, when going into GoPro’s online Media Library, you get an option to download a “compressed file”.
This is something else than the low-resolution file stored internally and looks like a 4:3-ratio image when opened in a video player.
Inspecting the file content reveals five streams:
- Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc), 2944x1472 [SAR 1:1 DAR 2:1], 26165 kb/s, 25 fps, 25 tbr, 90k tbn, 50 tbc (default)
- Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 189 kb/s (default)
- Stream #0:2(eng): Audio: pcm_s32le (in32 / 0x32336E69), 48000 Hz, 4.0, s32, 6144 kb/s (default)
- Stream #0:3(eng): Data: none (tmcd / 0x64636D74)
- Stream #0:4(eng): Data: bin_data (gpmd / 0x646D7067), 91 kb/s (default)
From what I understand, this file only shows a rectified version of the front camera (at 2944x1472 pixels and stored with the H.264 codec). Then they have included both a stereo audio track (AAC) and a 4.0-channel (ambisonics?) raw audio track, as well as two data channels.
The most useful thing about this file is that you can quickly get a “regular” video out of the 360-degree recording. For most other purposes, I think it is better to go for the .360 files.
The .360 files
Things become more complex when we get to the high-resolution files recorded with the GoPro Max, the so-called .360 files. Inspecting the content of this file reveals a video stream of 4096x1344 pixels encoded with H.265 compression.
I am impressed that they have included such a modern and resource-intensive compression format in such a tiny camera! I am also impressed that they store four channels of 32-bit PCM audio at 48 kHz. That is what you would expect from a high-end audio recorder, not a tiny action camera.
The .360 files play nicely out of the box in VLC because it uses standard video compression inside.
However, there are several strange things with this image. The first problem is that it only shows part of the scene; where am I? After fiddling around in VLC, I discovered that the file contains two video tracks. Fortunately, in VLC it is possible to choose Video > Video track
to watch the remaining part of the image:
Then I can look at myself, albeit with a 90-degree rotation:
Fortunately, GoPro has made a page that describes the specs of GoPro Max and their rationale behind breaking up the file into two video streams. They are maximizing pixels, which makes sense in many ways. However, they also make life difficult for some of their users, including myself, who cannot use their Windows-based software for converting the files.
The details of the .360 files
The discovery of two video streams in a file told me that more things are going on in the .360 files than what is revealed in the standard inspector window. So I had to turn to FFmpeg to give some more details. It is easy to get an overview of what is going on in a file by running the command:
ffmpeg -i file.360
This returns a bunch of different things, but cutting out only the lines starting with “stream” gives us this overview:
- Stream #0:0(eng): Video: hevc (Main) (hvc1 / 0x31637668), yuvj420p(pc, bt709), 4096x1344 [SAR 1:1 DAR 64:21], 30001 kb/s, 25 fps, 25 tbr, 90k tbn, 25 tbc (default)
- Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, flip, 189 kb/s (default)
- Stream #0:2(eng): Data: none (tmcd / 0x64636D74) (default)
- Stream #0:3(eng): Data: bin_data (gpmd / 0x646D7067), 93 kb/s (default)
- Stream #0:4(eng): Data: none (fdsc / 0x63736466), 18 kb/s (default)
- Stream #0:5(eng): Video: hevc (Main) (hvc1 / 0x31637668), yuvj420p(pc, bt709), 4096x1344 [SAR 1:1 DAR 64:21], 30002 kb/s, 25 fps, 25 tbr, 90k tbn, 25 tbc (default)
- Stream #0:6(eng): Audio: pcm_s32le (in32 / 0x32336E69), 48000 Hz, 4.0, s32, 6144 kb/s (default)
This can be broken down into something more structured:
- 2 video streams (0 and 5). The first contains the main front image, and the other a rotated back image. I do not understand why they are numbered this way, but that does not matter if I know which streams they are. Both video streams are captured in 4096x1344 pixels, with some overlap, as mentioned in the GoPro specs.
- 2 audio streams (1 and 6). The first is a 2-channel stream encoded with AAC (at a decent 189 kbps), and the other is a 4.0-channel stream encoded in uncompressed 32-bit PCM audio. This is very exciting; it may seem like they are recording an ambisonics-like audio stream. Definitely, something I will investigate soon!
- 3 data streams (2,3,4) containing sensor information, probably about motion (is there an accelerometer or gyro in there?) and position (GPS?). This data is irrelevant to my #StillStanding project, so I will leave it for another time to investigate these streams.
In any case, I am very impressed by what they have managed to pack into such a tiny camera. The file format is oddly formatted, but thanks to FFmpeg, I have understood most of what is happening inside.
Exporting video from the .360 files
For my #StillStanding project, I am interested in cropping out a video of myself standing still, and for the AMBIENT project, I want to extract a video of the room I am standing in. This is not straightforward when I have two .360 files containing 2 video streams.
I have therefore written a short script that merges the files and extracts the various tracks:
# Create list of files
printf "file '%s'\n" ../1-raw/*.360 > mylist.txt
# Concatenate files and extract tracks
ffmpeg -f concat -safe 0 -i mylist.txt -map 0:0 -map 0:6 -c copy track0.mkv -map 0:5 -map 0:6 -c copy track5.mkv -map 0:1 -c copy track1.aac -map 0:6 -c copy track6.wav
Then I end up with four files:
The reason I am using a Matroska container (.MKV) here is that FFmpeg chokes if you try to write PCM audio into a MPEG-4 container:
Could not find tag for codec pcm_s32le in stream #1, codec not currently supported in container
I could have written the AAC-compressed audio stream into a .MP4 file instead, but have decided to keep the 4.0-channel audio for now.
Cropping video from the .360 files
Once I have the two video streams as separate files, it is possible to trim the files and crop out myself from the image. The trimming is done by manually checking the beginning and end of my recording (which are marked by clapping):
video_clap_start = '00:01:13'
video_clap_end = '00:10:52'
There is now a cropping function in the Musical Gestures Toolbox, but I have been running these functions in the terminal, and then it is necessary to set the cropping manually. I typically do this by previewing the file with FFplay:
play -i track5.mkv -ss $video_clap_start -vf crop=1430:600:1210:380,transpose=2
Once I find the right coordinates, I crop the file:
ffmpeg -i track5.mkv -ss $video_clap_start -to $video_clap_end -vf crop=1430:600:1210:380,transpose=2 video_out.mp4
This will recompress the file, so it takes some time.
I also crop out my “field of view” while standing still. This I take from the first video stream:
ffmpeg -i track0.mkv -ss $video_clap_start -to $video_clap_end -vf crop=2700:1300:700:0 video_out_fov.mp4
The above function only crops in the original video stream. It does not make any projections. This means that the resultant video looks odd on the left and right sides:
For my current usage, that is fine. I am interested in the global view, and distortions do not matter. However, in the future, I will need to figure out how to handle the projection properly. I have previously blogged about how to work with the Ricoh Theta 360-degree videos and Garmin VIRB 360 recordings. Some people have managed to (almost) reverse-engineer GoPro’s 360 Video File Format.
Given that GoPro has revealed their tricks in packing many pixels into their files and providing free Windows software for converting files, I am disappointed that they do not provide any source code for stitching them.
Conclusion
This blog post became much longer than expected. That is mainly because the GoPro Max records so much data and media in a complex way. I am impressed by the achievements of their tiny devices, but I am disappointed about the lack of documentation and tools available to work with all this information. Hopefully, this blog post can help others understand what is happening in their files. I have covered most things related to the video streams, and I will explore the audio and data streams in more detail in future blog posts.