Tobias Kirschstein

PhD Student, Technical University of Munich

About Me

I am a PhD student in the Visual Computing & Artificial Intelligence Group at the Technical University of Munich, supervised by Prof. Matthias Nießner.

During summer 2024, I completed a Research Scientist internship at Meta Reality Labs under the guidance of Shunsuke Saito and Javier Romero.

I am the creator and maintainer of the NeRSemble dataset, for which I built a custom multi-view setup with 16 video cameras and recorded facial expressions of over 250 individuals. Since its release, the dataset has enabled various research projects around 3D head avatars. Furthermore, we used the NeRSemble dataset to instantiate the first 3D head avatar benchmark.

Before starting my PhD, I completed a M.Sc. degree in Informatics at TU Munich with my Master’s Thesis focusing on Neural Rendering for novel-view synthesis on outdoor scenes using sparse point clouds. I obtained a B.Sc. degree in both Mathematics and Computer Science at the University of Passau, where I studied how Deep Learning can be used for emotion recognition from physiological signals under the supervision of Prof. Björn Schuller.

My current research interests lie in Neural Rendering, 3D Scene Representations, Dynamic 3D Reconstruction and Animatable 3D Head Avatars.

Publications

Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars

ICCV 2025

Avat3r takes 4 input images of a person’s face and generates an animatable 3D head avatar in a single forward pass. The resulting 3D head representation can be animated at interactive rates. The entire creation process of the 3D avatar, from taking 4 smartphone pictures to the final result, can be executed within minutes.

Tobias Kirschstein, Javier Romero, Artem Sevastopolsky, Matthias Nießner, Shunsuke Saito

Project Paper Video

@misc{kirschstein2025avat3r,
      title={Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars},
      author={Tobias Kirschstein and Javier Romero and Artem Sevastopolsky and Matthias Nie\ss{}ner and Shunsuke Saito},
      year={2025},
      eprint={2502.20220},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.20220},
}

GaussianSpeech: Audio-Driven Gaussian Avatars

ICCV 2025

Given input speech signal, GaussianSpeech can synthesize photorealistic 3D-consistent talking human head avatars. Our method can generate realistic and high-quality animations, including mouth interiors such as teeth, wrinkles, and specularities in the eyes.

Shivangi Aneja, Artem Sevastopolsky, Tobias Kirschstein, Justus Thies, Angela Dai, Matthias Nießner

Project Paper Video

@misc{aneja2024gaussianspeech,
      title={GaussianSpeech: Audio-Driven Gaussian Avatars}, 
      author={Shivangi Aneja and Artem Sevastopolsky and Tobias Kirschstein and Justus Thies and Angela Dai and Matthias Nießner},
      year={2024},
      eprint={2411.18675},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.18675}, 
}

3DGH: 3D Head Generation with Composable Hair and Face

SIGGRAPH 2025

We present 3DGH, an unconditional generative model for 3D human heads with composable hair and face components. Unlike previous work that entangles the modeling of hair and face, we propose to separate them using a novel data representation with template-based 3D Gaussian Splatting, in which deformable hair geometry is introduced to capture the geometric variations across different hairstyles.

Chengan He, Junxuan Li, Tobias Kirschstein, Artem Sevastopolsky, Shunsuke Saito, Qingyang Tan, Javier Romero, Chen Cao, Holly Rushmeier, Giljoo Nam

Project Paper Video

@article{he2025head,
    title={3DGH: 3D Head Generation with Composable Hair and Face},
    author={He, Chengan and Li, Junxuan and Kirschstein, Tobias and Sevastopolsky, Artem and Saito, Shunsuke and Tan, Qingyang and Romero, Javier and Cao, Chen and Rushmeier, Holly and Nam, Giljoo},
    journal={ACM Transactions on Graphics},
    volume={44},
    number={4},
    pages={1--12},
    year={2025}
}

GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion

CVPR 2025

Gaussian Avatar Fusion improves 3D head avatars reconstructed from a single monocular video by employing a multi-view diffusion model to supervise novel views. The resulting 3D representation has notably higher fidelity.

Jiapeng Tang, Davide Davoli, Tobias Kirschstein, Liam Schoneveld, Matthias Niessner

Project Paper Video Code

@inproceedings{tang2025gaf,
  title={Gaf: Gaussian avatar reconstruction from monocular videos via multi-view diffusion},
  author={Tang, Jiapeng and Davoli, Davide and Kirschstein, Tobias and Schoneveld, Liam and Niessner, Matthias},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={5546--5558},
  year={2025}
}

GGHead: Fast and Generalizable 3D Gaussian Heads

SIGGRAPH Asia 2024

GGHead generates photo-realistic 3D heads and renders them at 1k resolution in real-time thanks to efficient 3D Gaussian Splatting. The pipeline is trained in a 3D GAN framework using only 2D images of faces.

Tobias Kirschstein, Simon Giebenhain, Jiapeng Tang, Markos Georgopoulos, Matthias Nießner

Project Paper Video Code

@inproceedings{kirschstein2024gghead,
    author = {Kirschstein, Tobias and Giebenhain, Simon and Tang, Jiapeng and Georgopoulos, Markos and Nie\ss{}ner, Matthias},
    title = {{GGHead: Fast and Generalizable 3D Gaussian Heads}},
    year = {2024},
    isbn = {9798400711312},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3680528.3687686},
    doi = {10.1145/3680528.3687686},
    booktitle = {SIGGRAPH Asia 2024 Conference Papers},
    articleno = {126},
    numpages = {11},
    keywords = {3D GAN, 3D head prior, 3D Gaussian Splatting},
    series = {SA '24}
}

NPGA: Neural Parametric Gaussian Avatars

SIGGRAPH Asia 2024

NPGA creates an animatable and photo-realistic 3D Gaussian representation from multi-view video recordings of a person’s head. The avatar can be animated via NPHM’s expression codes. To obtain high-quality results, NPGA models facial expressions in two ways: NPHM’s deformation field models coarse expressions while a second, learnable deformation field models the residual detailed expressions.

Simon Giebenhain, Tobias Kirschstein, Martin Rünz, Lourdes Agapito, Matthias Nießner

Project Paper Video

@inproceedings{giebenhain2024npga,
    author = {Giebenhain, Simon and Kirschstein, Tobias and R\"{u}nz, Martin and Agapito, Lourdes and Nie\ss{}ner, Matthias},
    title = {{NPGA: Neural Parametric Gaussian Avatars}},
    year = {2024},
    isbn = {9798400711312},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3680528.3687689},
    doi = {10.1145/3680528.3687689},
    booktitle = {SIGGRAPH Asia 2024 Conference Papers},
    articleno = {127},
    numpages = {11},
    keywords = {Virtual avatars, 3D Gaussian splatting, Data-driven animation, 3d morphable models},
    location = {
    },
    series = {SA '24}
}

DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars

CVPR 2024

DiffusionAvatar uses diffusion-based, deferred neural rendering to translate geometric cues from an underlying neural parametric head model (NPHM) to photo-realistic renderings. The underlying NPHM provides accurate control over facial expressions, while the deferred neural rendering leverages the 2D prior of StableDiffusion, in order to generate compelling images.

Tobias Kirschstein, Simon Giebenhain, Matthias Nießner

Project Paper Video Code

@inproceedings{kirschstein2024diffusionavatars,
  title={DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars},
  author={Kirschstein, Tobias and Giebenhain, Simon and Nie{\ss}ner, Matthias},
  booktitle={Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians

CVPR 2024 (Highlight)

GaussianAvatars rigs 3D Gaussians to a parametric mesh model for photorealistic avatar creation and animation. During avatar reconstruction, the morphable model parameters and Gaussian splats are optimized jointly in an end-to-end fashion from video recordings. GaussianAvatars can then be animated through expression transfer from a driving sequence or by manually changing the morphable model parameters.

Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Simon Giebenhain, Matthias Nießner

Project Paper Video

@article{qian2023gaussianavatars,
  title={GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians},
  author={Qian, Shenhan and Kirschstein, Tobias and Schoneveld, Liam and Davoli, Davide and Giebenhain, Simon and Nie{\ss}ner, Matthias},
  journal={arXiv preprint arXiv:2312.02069},
  year={2023}
}

MonoNPHM: Dynamic Head Reconstruction from Monoculuar Videos

CVPR 2024 (Highlight)

MonoNPHM is a neural parametric head model that disentangles geomery, appearance and facial expression into three separate latent spaces. Using MonoNPHM as a prior, we tackle the task of dynamic 3D head reconstruction from monocular RGB videos, using inverse, SDF-based, volumetric rendering.

Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos, Martin Rünz, Lourdes Agapito, Matthias Nießner

Project Paper Video

@inproceedings{giebenhain2024mononphm,
  author={Simon Giebenhain and Tobias Kirschstein and Markos Georgopoulos and  Martin R{\"{u}}nz and Lourdes Agapito and Matthias Nie{\ss}ner},
  title={MonoNPHM: Dynamic Head Reconstruction from Monocular Videos},
  booktitle={Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads

Siggraph 2023

NeRSemble reconstructs high-fidelity dynamic radiance fields of human heads. We combine a deformation for coarse movements with an ensemble of 3D multi-resolution hash encodings. These act as a form of expression-dependent volumetric textures that model fine-grained, expression-dependent details. Additionally, we propose a new 16 camera multi-view capture dataset (7.1 MP resolution and 73 frames per second) containing 4700 sequences of more than 220 human subjects.

Tobias Kirschstein, Shenhan Qian, Simon Giebenhain, Tim Walter, Matthias Nießner

Project Paper Video Code Dataset

@article{kirschstein2023nersemble,
    author = {Kirschstein, Tobias and Qian, Shenhan and Giebenhain, Simon and Walter, Tim and Nie\ss{}ner, Matthias},
    title = {NeRSemble: Multi-View Radiance Field Reconstruction of Human Heads},
    year = {2023},
    issue_date = {August 2023},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    volume = {42},
    number = {4},
    issn = {0730-0301},
    url = {https://doi.org/10.1145/3592455},
    doi = {10.1145/3592455},
    journal = {ACM Trans. Graph.},
    month = {jul},
    articleno = {161},
    numpages = {14},
}

NPHM: Learning Neural Parametric Head Models

CVPR 2023

NPHM is a field-based neural parametric model for human heads, which represents identity geometry implicitly in a cononical space and models expressions as forward deformations. The SDF in canonical space is represented as an ensemble of local MLPs centered around facial anchor points. To train our model, we capture a large dataset of complete head geometry containing over 250 people in 23 expressions each, using high quality structured light scanners.

Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos, Martin Rünz, Lourdes Agapito, Matthias Nießner

Project Paper Video Code Dataset

@inproceedings{giebenhain2023nphm,
    author={Simon Giebenhain and Tobias Kirschstein and Markos Georgopoulos and  Martin R{\"{u}}nz and Lourdes Agapito and Matthias Nie{\ss}ner},
    title={Learning Neural Parametric Head Models},
    booktitle = {Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
    year = {2023}
}

Language-agnostic representation learning of source code from structure and context

ICLR 2021

We present CodeTransformer, which combines source code (Context) and parsed abstract syntax trees (ASTs; Structure) for representation learning on code. Code and Structure are two complementary representations of the same computer program, and we show the benefit of combining both for the task of method name prediction. To achieve this, we propose an extension to transformer architectures that can handle both graph and sequential inputs.

Daniel Zügner, Tobias Kirschstein, Michele Catasta, Jure Leskovec, Stephan Günnemann

Paper Video Code Demo Slides

@inproceedings{zuegner2021codetransformer,
  author       = {Daniel Z{\"{u}}gner and
                  Tobias Kirschstein and
                  Michele Catasta and
                  Jure Leskovec and
                  Stephan G{\"{u}}nnemann},
  title        = {Language-Agnostic Representation Learning of Source Code from Structure and Context},
  booktitle    = {9th International Conference on Learning Representations, {ICLR} 2021,
                  Virtual Event, Austria, May 3-7, 2021},
  year         = {2021},
  url          = {https://openreview.net/forum?id=Xh5eMZVONGF},
}

End-to-end learning for dimensional emotion recognition from physiological signals

ICME 2017

We show that end-to-end Deep Learning can replace traditional feature engineering in the signal processing domain. Not only does a combination of convolutional layers and LSTMs perform better for the task of emotion recognition, we also demonstrate that some cells’ activations in the convolutional network are highly correlated with hand-crafted features.

Gil Keren, Tobias Kirschstein, Erik Marchi, Fabien Ringeval, Björn Schuller

Paper

@inproceedings{keren2017end,
  title={End-to-end learning for dimensional emotion recognition from physiological signals},
  author={Keren, Gil and Kirschstein, Tobias and Marchi, Erik and Ringeval, Fabien and Schuller, Bj{\"o}rn},
  booktitle={2017 IEEE International Conference on Multimedia and Expo (ICME)},
  pages={985--990},
  year={2017},
  organization={IEEE}
}

Teaching

3D Scanning & Spatial Learning Practical

Instructor - Winter Semester 2024/25

Offered and supervised projects for teams of 2-3 students on the following topics:

Generative 3D Heads
Dynamic Geometry Reconstruction
3D Head Segmentation

3D Scanning & Spatial Learning Practical

Instructor - Summer Semester 2024

Offered and supervised projects for teams of 2-3 students on the following topics:

Generalizable 3D Head Reconstruction
Monocular 3D Head Avatars
3D Scene Flow
Text-guided 3D Head Editing

3D Scanning & Spatial Learning Practical

Instructor - Winter Semester 2023/24

Offered and supervised projects for teams of 2-3 students on the following topics:

Codec Avatars for Teleconferencing
Intuitive Face Animation through Sparse Deformation Components
Multi-view Stereo via Inverse Rendering
Synthetic 3D Hair Reconstruction

3D Scanning & Spatial Learning Practical

Instructor - Summer Semester 2023

Offered and supervised projects for teams of 2-4 students on the following topics:

3D Face Reconstruction and Tracking
Intuitive Speech-driven Face Animation
Reconstructing surfaces with NeuS and Deep Marching Tetrahedra
Multi-view 3D Hair Reconstruction

Reviewing

BMVC

British Machine Vision Conference

2025: 3 papers

CVPR

IEEE/CVF Computer Vision and Pattern Recognition Conference

2024: 4 papers
2025: 4 papers

ICCV

International Conference on Computer Vision

2025: 4 papers

Siggraph

ACM SIGGRAPH

2024: 2 papers
2025: 4 papers

Siggraph Asia

ACM SIGGRAPH

2024: 5 papers
2025: 2 papers

TPAMI

Transactions on Pattern Analysis and Machine Intelligence

2025: 1 paper