Making hyperlapse videos is the art of precision. It requires photographers to either take photos between consistent intervals with a camera mounted on a tripod or use expensive equipment like stabilizers or motorized dollies and often takes a long time to do. To address this issue, we present HUSTL (Hyper-Ultra-Super-Time-Lapse), an open source three-stage software pipeline based on the state of the art academic research papers, to ease the production of high-quality hyperlapse. Our algorithm takes in either photos or videos as input, and produces videos with consistent colors and stabilized shots.
My role in this project is to implement the second stage in which input images are color corrected based on the strongest SIFT features of all inputs to ensure that all frames are consistent in white balance, tone, and exposure. The method I used here is adapted from the paper Efficient and Robust Color Consistency for Community Photo Collections by Park et al. You can find our open source code and more details about our implementation below.
* All contents on this page are based on our technical write-up. Please refer to it (link below) for more technical details and the complete list of citations.
Computer Vision Researcher & Software Engineer
Feb - May 2019
Jiaju Ma, Michael Mao, James Li
Python, NumPy, Sci-Kit Image, Sci-Kit Video, OpenCV, Cyvlfeat, MATLAB, Computer Vision Toolbox, MACE
As mentioned above, the current process of creating hyperlapse videos can be both time and resource consuming. Specifically, photos taken at different locations can vary in color due to the change in lighting conditions and can suffer from stabilization issues when stitched into videos if stabilizers or tripods are not used. Moreover, these issues become more prominent when you want to convert a raw footage of one holding a camera travelling through space into a hyperlapse.
We created a three-stage pipeline to aid people with producing high-quality hyperlapse. You can use either a series of photos or a video as the input. If the input is a video, the first stage of our outline will extract optimal frames from the video through reducing a cost matrix. In the second stage, input frames will be color corrected according to SIFT features extracted from each frame to ensure the colors are consistent among images. Finally, the third stage takes in the processed images and conducts camera path stabilization on them to output the final video.
This stage selects the frames that compose the smoothest camera movement and the most consistent frame rate in the output video. The implementation is adapted from the paper Real-Time Hyperlapse Creation via Optimal Frame Selection by Joshi et al. and consists of three steps – frame matching, cost building, and frame selection. In frame matching, we first extract SIFT features from each frame and then feature pairs between frames were matched to calculate the homography matrix between each frame in a given window size. In cost building, we compute the motion cost that measures the similarities between frames from the alignment and overlap costs. Finally, optimal frames are selected based on the motion cost and other factors like sudden movements and incoherent frame rates.
The second stage of the pipeline takes in a series of images and applies color adjustments (white balance, colortone, and gamma) to the inputs through a global color correction model. The method we used in this stage is adapted from the paper Efficient and Robust Color Consistency for Community Photo Collections by Park et al. SIFT features are extracted from each input image and similar features are matched to create unique matching pairs. The feature pairs are used to construct an undirected match graph, and color patches are extracted from images based on the match graph’s maximal cliques of size 2 and above. These patches are then combined to construct an observation matrix that consists of a color coefficient matrix, an albedo matrix, and a residual matrix. The observation matrix is processed by a technique proposed in the paper Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-rank Matrix Decomposition by Cabral et al. before being applied to all input images to achieve coherent color consistency.
For the final stage of the pipeline, we adapted the stabilization algorithm from the paper Bundled Camera Paths for Video Stabilization by Liu et al. Each input image is split into subsections and the camera path of each of these smaller cells are smoothed by the product homography between corresponding cells in adjacent frames. An overall path for the resulting stabilized footage is then calculated with quadratic functions. Finally, the frames go through As-Similar-As-Possible warping as with shape preservation to produce the final hyperlapse video, as described in the paper Generalized As-Similar-As-PossibleWarping with Applications in Digital Photography by Chen et al.
To test and evaluate our pipeline, we created our own dataset by using a commercial DSL to take photos and videos without any ancillary equipment. You can see in the baseline videos below that these raw footages are rough and shaky. The resulting hyperlapse videos produced by our pipeline are shown on the right, and you can see how the image warping works to create stabilized shots in the last row of videos.
Baseline
Result
Baseline
Result