Tobias Kirschstein
PhD Student, Technical University of MunichAbout Me
I am a PhD student in the Visual Computing & Artificial Intelligence Group at the Technical University of Munich, supervised by Prof. Matthias Nießner.During summer 2024, I am completing a Research Scientist internship at Meta Reality Labs under the guidance of Shunsuke Saito and Javier Romero.
I am the creater and maintainer of the NeRSemble dataset, for which I built a custom multi-view setup with 16 video cameras and recorded facial expressions of over 250 individuals. Since its release, the dataset has enabled various research projects around 3D head avatars.1
Before starting my PhD, I completed a M.Sc. degree in Informatics at TU Munich with my Master’s Thesis focusing on Neural Rendering for novel-view synthesis on outdoor scenes using sparse point clouds. I obtained a B.Sc. degree in both Mathematics and Computer Science at the University of Passau, where I studied how Deep Learning can be used for emotion recognition from physiological signals under the supervision of Prof. Björn Schuller.
My current research interests lie in Neural Rendering, 3D Scene Representations, Dynamic 3D Reconstruction and Animatable 3D Head Avatars.
NeRSemble dataset
- Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
- Diffusion Avatars: Deferred Diffusion for High-fidelity 3D Head Avatars
- GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians
- High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse Rendering
- DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis
- FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
- GEM: Gaussian Eigen Models for Human Heads
- NPGA: Neural Parametric Gaussian Avatars
- 3D Gaussian Parametric Head Model
- Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing
- HeadGAP: Few-shot 3D Head Avatar via Generalizable GAussian Priors
- SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars
- VOODOO XP: Expressive One-Shot Head Reenactment for VR
- Stable Video Portraits
Publications
DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars
DiffusionAvatar uses diffusion-based, deferred neural rendering to translate geometric cues from an underlying neural parametric head model (NPHM) to photo-realistic renderings. The underlying NPHM provides accurate control over facial expressions, while the deferred neural rendering leverages the 2D prior of StableDiffusion, in order to generate compelling images.
Cite
@inproceedings{kirschstein2024diffusionavatars,
title={DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars},
author={Kirschstein, Tobias and Giebenhain, Simon and Nie{\ss}ner, Matthias},
booktitle={Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}
GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians
GaussianAvatars rigs 3D Gaussians to a parametric mesh model for photorealistic avatar creation and animation. During avatar reconstruction, the morphable model parameters and Gaussian splats are optimized jointly in an end-to-end fashion from video recordings. GaussianAvatars can then be animated through expression transfer from a driving sequence or by manually changing the morphable model parameters.
Cite
@article{qian2023gaussianavatars,
title={GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians},
author={Qian, Shenhan and Kirschstein, Tobias and Schoneveld, Liam and Davoli, Davide and Giebenhain, Simon and Nie{\ss}ner, Matthias},
journal={arXiv preprint arXiv:2312.02069},
year={2023}
}
MonoNPHM: Dynamic Head Reconstruction from Monoculuar Videos
MonoNPHM is a neural parametric head model that disentangles geomery, appearance and facial expression into three separate latent spaces. Using MonoNPHM as a prior, we tackle the task of dynamic 3D head reconstruction from monocular RGB videos, using inverse, SDF-based, volumetric rendering.
Cite
@inproceedings{giebenhain2024mononphm,
author={Simon Giebenhain and Tobias Kirschstein and Markos Georgopoulos and Martin R{\"{u}}nz and Lourdes Agapito and Matthias Nie{\ss}ner},
title={MonoNPHM: Dynamic Head Reconstruction from Monocular Videos},
booktitle={Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}
NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads
NeRSemble reconstructs high-fidelity dynamic radiance fields of human heads. We combine a deformation for coarse movements with an ensemble of 3D multi-resolution hash encodings. These act as a form of expression-dependent volumetric textures that model fine-grained, expression-dependent details. Additionally, we propose a new 16 camera multi-view capture dataset (7.1 MP resolution and 73 frames per second) containing 4700 sequences of more than 220 human subjects.
Cite
@article{kirschstein2023nersemble,
author = {Kirschstein, Tobias and Qian, Shenhan and Giebenhain, Simon and Walter, Tim and Nie\ss{}ner, Matthias},
title = {NeRSemble: Multi-View Radiance Field Reconstruction of Human Heads},
year = {2023},
issue_date = {August 2023},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {42},
number = {4},
issn = {0730-0301},
url = {https://doi.org/10.1145/3592455},
doi = {10.1145/3592455},
journal = {ACM Trans. Graph.},
month = {jul},
articleno = {161},
numpages = {14},
}
NPHM: Learning Neural Parametric Head Models
NPHM is a field-based neural parametric model for human heads, which represents identity geometry implicitly in a cononical space and models expressions as forward deformations. The SDF in canonical space is represented as an ensemble of local MLPs centered around facial anchor points. To train our model, we capture a large dataset of complete head geometry containing over 250 people in 23 expressions each, using high quality structured light scanners.
Cite
@inproceedings{giebenhain2023nphm,
author={Simon Giebenhain and Tobias Kirschstein and Markos Georgopoulos and Martin R{\"{u}}nz and Lourdes Agapito and Matthias Nie{\ss}ner},
title={Learning Neural Parametric Head Models},
booktitle = {Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
year = {2023}
}
Language-agnostic representation learning of source code from structure and context
We present CodeTransformer, which combines source code (Context) and parsed abstract syntax trees (ASTs; Structure) for representation learning on code. Code and Structure are two complementary representations of the same computer program, and we show the benefit of combining both for the task of method name prediction. To achieve this, we propose an extension to transformer architectures that can handle both graph and sequential inputs.
Cite
@inproceedings{zuegner2021codetransformer,
author = {Daniel Z{\"{u}}gner and
Tobias Kirschstein and
Michele Catasta and
Jure Leskovec and
Stephan G{\"{u}}nnemann},
title = {Language-Agnostic Representation Learning of Source Code from Structure and Context},
booktitle = {9th International Conference on Learning Representations, {ICLR} 2021,
Virtual Event, Austria, May 3-7, 2021},
year = {2021},
url = {https://openreview.net/forum?id=Xh5eMZVONGF},
}
End-to-end learning for dimensional emotion recognition from physiological signals
We show that end-to-end Deep Learning can replace traditional feature engineering in the signal processing domain. Not only does a combination of convolutional layers and LSTMs perform better for the task of emotion recognition, we also demonstrate that some cells’ activations in the convolutional network are highly correlated with hand-crafted features.
Cite
@inproceedings{keren2017end,
title={End-to-end learning for dimensional emotion recognition from physiological signals},
author={Keren, Gil and Kirschstein, Tobias and Marchi, Erik and Ringeval, Fabien and Schuller, Bj{\"o}rn},
booktitle={2017 IEEE International Conference on Multimedia and Expo (ICME)},
pages={985--990},
year={2017},
organization={IEEE}
}
Teaching
3D Scanning & Spatial Learning Practical
Offered and supervised projects for teams of 2-3 students on the following topics:
- Codec Avatars for Teleconferencing
- Intuitive Face Animation through Sparse Deformation Components
- Multi-view Stereo via Inverse Rendering
- Synthetic 3D Hair Reconstruction
3D Scanning & Spatial Learning Practical
Offered and supervised projects for teams of 2-4 students on the following topics:
- 3D Face Reconstruction and Tracking
- Intuitive Speech-driven Face Animation
- Reconstructing surfaces with NeuS and Deep Marching Tetrahedra
- Multi-view 3D Hair Reconstruction
Reviewing
CVPR
- 2024: 4 papers
Siggraph
- 2024: 2 papers
Siggraph Asia
- 2024: 5 papers