VR cameras capture video in various immersive formats and are available in consumer and professional models. You’ll need to consider formats, feature sets and budget before choosing one.
Read Time: 5 Minutes
Before starting an immersive media project it is critical to determine the form in which the project will be delivered, as camera selection will change based on target formats. Immersive media is usually video-based, wide field-of-view (180 or 360) and either monoscopic (2D) or stereoscopic (3D). To understand these formats see [LINK] Immersive Media Formats[/LINK].
VR cameras are available as consumer, prosumer, and professional models. In general, consumer cameras on the market capture monoscopic (2D) 360 and cost hundreds of dollars. Cameras like the models sold by GoPro and Insta360 double as action cameras that capture everything and don’t need to be actively positioned while recording. 360 cameras record the entire scene in every direction via multiple overlapping fisheye lenses. Consumer cameras usually have only two lenses arranged back-to-back, but higher-end, professional 360 cameras, which can cost many thousands of dollars, usually have more lenses and also can record in stereoscopic (3D) 360.
3D-180 (also known as “VR180”) cameras capture stereoscopic 3D-180-degree video with a pair of side-by-side fisheye lenses. Only half the world is captured, and when viewed in a VR headset, the rear half of the world is black. Most 3D-180 cameras have an IAD (interaxial distance) between the two lenses that is roughly close to human IPD (Interpupillary Distance), which results in human-scale stereoscopic 3D experiences when viewed in VR.
From left to right: Canon R5 C camera with RF 2.5mm Dual Fisheye Lens for 3D-180, Ricoh Theta Z1 dual lens camera for 2D 360, Insta360 Pro six lens camera for either 2D 360 or 3D-360. Image: Steve Cooper
3D 360 vs. 2D 360
3D-360 is captured with high-end camera rigs that generally have at six or more lenses. These cameras can generate a large amount of data, and high-resolution 3D-360 video requires significant compute resources to stitch, edit and render. Final resolutions that are 8K or greater are commonly targeted in high-quality 3D-360 video.
Monoscopic 360 is easier to produce, but looks flat when viewed in VR headsets. It can be captured by a wider range of cameras, from consumer-grade, dual-lens cameras to professional multi-lens cameras. Stitching and editing requires fewer compute resources, and final videos are half the resolution of 3D-360 videos because they only represent one “eye” versus the two eyes required for stereoscopic 3D.
Multi-lens 360 cameras such as the Insta360 Pro 2 make use of very large overlaps from adjacent lenses, a feature that allows for high-quality 360 and 3D-360 capture. Image: Steve Cooper/Keith Martin
High-end VR camera systems that capture 3D-180 video are becoming increasingly common in immersive media production. To generate 3D-180 video, a pair of left/right fisheye lenses record the scene as a pair of hemispherical videos. Those frames are de-warped and arranged as a pair of left/right stereoscopic images similar to how traditional 3D movies are presented . The image pairs must be well-aligned and in perfect-sync during capture.
Lenses such as the Canon RF 2.5mm Dual Fisheye Lens are made specifically to capture 3D-180, and can be used with traditional camera bodies such as the Canon R5 and R5 C.
30fps (or 29.97fps) is the recommended minimum frame rate for VR, with 60fps yielding smoother playback in VR headsets. However, capturing higher frame rates generates proportionately larger amounts of video data and requires more post processing. At 30fps, viewers may notice a judder or strobing when large objects move across the field of view. Immersive video is more comfortable to view when it’s 60fps, but the tradeoffs between frame rate, resolution, and distribution end points should be carefully considered.
Note that 24fps and 30fps rectilinear (traditional) video is less susceptible to judder during playback because it’s usually experienced in a small virtual display in VR. However, if a rectilinear video is played back on a very large virtual screen, judder can be a problem.
Each VR camera model will have a final stitched resolution as part of its specifications. Meta Quest 2 supports playback up to 8192x4096 and 5760x5760, but the actual video resolution delivered to users will depend on where video is hosted and whether it is streamed or playback back locally.
To learn more about recommended and maximum video specifications for Meta Quest 2 headsets, see Encoding immersive videos for Meta Quest 2.
Due to requiring multiple sensors and lenses, 360 cameras usually feature small to medium-sized sensors. While smaller sensors do not perform as well in low light conditions, they allow camera designers to build smaller rigs with the lenses and sensors closer together. This minimizes parallax and can help to make the stitching process easier.
As an example, one of the larger off-the-shelf professional 3D-360 cameras has ten Micro 4/3rds (MFT) sensors and fisheye lenses; image quality is excellent, but due to the large distance between lenses and sensors, it is challenging to stitch comfortable 3D-360 video when there are close subjects, and output is not human scale when viewed in VR headsets. The camera is also extremely large, making it difficult to fit into smaller physical spaces.
Other cameras use 6 to 8 lenses, some with smaller sensors. These allow for closer subject distances, can capture closer-to-human-scale stereoscopic video, and they are compact enough to fit into smaller physical spaces.
Finally, at the smallest end of the spectrum, dual lens cameras with 1-inch-type or smaller sensors are pocketable, capture monoscopic 360 video, and can fit into very compact spaces such as car interiors.They are commonly used as POV / action cameras to capture sports.
Although smaller sensors yield smaller cameras, they capture noisier video than larger sensors do, and have a smaller dynamic range (this can be challenging when shooting indoor scenes with bright windows, for example).