Introduction
Virtual YouTubers, better known as vtubers, have completely dominated the internet. Pioneered by Japanese content creator Kizuna Ai, what started as an internet niche has taken over mainstream media. For example, hololive vtuber Gawr Gura recently performed Take Me Out to the Ball Game for the Los Angeles Dodgers, and Puerto Rican vtuber Ironmouse breaking Kai Cenat’s record for most active subscribers on Twitch.
Each vtuber uses a model, either 2D or 3D, to portray themselves as a cartoon character when live. Most often, vtubers focus on anime-styled proportions to pay homage to the Japanese origin of the craft. While often labeled as “cringe” by outsiders, the technical skill and creativity required of a vtuber artist is immense; models are often priced at two to eight thousand dollars, a suitable price for a product that often takes hundreds of hours to create. Demonstrating the process of vtuber creation aptly showcases the fusion of both art and technology in the entertainment world. Hopefully, the insight provided into the vtuber industry through this article helps remove its “cringe” label, instead shifting focus onto the incredible artistry of these models.
Starting With a Reference Sheet

Caption: A reference sheet for HoloLive vtuber Hakos Baelz. Created and designed by famous Japanese artist Mika Pikazo. Image owned by HoloLive, property of Cover Corp.
The reference process is usually the same for both 2D and 3D vtubers. For 2D vtubers, the initial planning simplifies the process and allows the artist to decide how each element of the character will move when animated. By contrast, 3D vtubers use the reference sheet for the initial stages of the modeling process. The initial block-outs are done directly on top of a projection of the reference sheet, reducing anatomy mistakes when creating the model. Usually, 3D artists will request a side view as well, something not pictured in the image above.
Making the Model

Images borrowed from the Live2D Cubism editor and ふさこ / 3D自習室 on YouTube.
When a reference image is finalized by the artist, the model begins to undergo its initial stages. For 2D vtubers, this refers to drawing hundreds, if not thousands, of layers made to adapt to the movements made by the streamer. The amount of layers, or separations, each 2D model has is directly correlated to the amount of movement that can be given to the model during the “rigging” phase. 3D models also must be assigned textures and materials, in order to add color to the mesh. Usually, making the model is the most time-consuming process, as artists often spend over 50 hours on this single step.
Rigging

Images borrowed from the Live2D user manual and ふさこ / 3D自習室 on youtube.
In animation, “rigging” is the term used to describe the assignment of movement data (or “bones” in 3D) to the individual vertices or pixels of a model. Without rigging, vtuber models would be unable to move in response to facial tracking data. In 2d, the rigging process is done through warping the pixels of the drawing, changing the perspective and giving the character a sense of dimension. In a 3d workflow, the rigging process relies on assigning individual vertices to a “bone,” and determining the effect of influence through either processes such as “weight painting” or “envelope weights,” depending on the software. While the two methods look very different, both result in a model capable of reacting to the streamer’s movements.
Tracking and Streaming

Image borrowed from VTube Studio website.
Once the artist and rigger have finished the model, the streamer uses a tracking program to make their avatar appear on broadcast. Whether 2d or 3d, these tracking softwares usually rely on the web-cam to map important facial or bodily features, translating the motions made by the streamer to the model using the “rig” created. Then, the live video created by the model is placed as an overlay on a traditional livestreaming program. Sometimes, high-end models will rely on motion-tracking suits or headgear to output more accurate data, however such devices are seldom used by aspiring vtubers, primarily due to their cost. No matter how a model is created, tracking software is what bridges the gap between the model and the stream, connecting a vtuber with their audience and allowing them to create content.





















































































