Siddharth Choudhary

I am a Principal Applied Scientist in Amazon AGI. I closely collaborate with AWS AI Labs. Before this, I was an applied scientist in Amazon Halo where I worked on problems at the intersection of Computer Vision and Health.

Prior to Amazon, I was a Principal Computer Vision Researcher at Magic Leap. I finished my Ph.D. from the School of Interactive Computing at Georgia Institute of Technology. I was advised by Professor Henrik I. Christensen and Professor Frank Dellaert. I completed my Bachelors and Masters in Computer Science from IIIT Hyderabad where I was advised by Prof. P J Narayanan.

Email / CV / Google Scholar / Twitter / LinkedIn / Github

Research

I'm interested in multimodal LLMs and its applications in computer vision and robotics.

Multimodal LLMs
	Amazon Nova Foundation Models AWS re:Invent, 2024 (Shipped with Nova Lite/Pro multimodal models) Technical Report Amazon Nova is a new line of foundation models delivering advanced AI capabilities. The family includes: Nova Pro (comprehensive multimodal model), Nova Lite (fast, cost-effective multimodal model), Nova Micro (efficient text model), Nova Canvas (image generation), and Nova Reel (video generation).
	Multi-modal hallucination control by visual information grounding Alessandro Favero, Luca Zancato, Matthew Trager, Siddharth Choudhary, Pramuditha Perera, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto, CVPR, 2024 arXiv / web We address hallucination in Generative Vision-Language Models by introducing Multi-Modal Mutual-Information Decoding (M3ID), which amplifies the influence of reference images, reducing hallucinated responses by up to 28% without compromising linguistic fluency.
	RAVEN: Multitask Retrieval Augmented Vision-Language Learning Varun Nagaraj Rao, Siddharth Choudhary, Aditya Deshpande, Ravi Kumar Satzoda, Srikar Appalaraju, arXiv, 2024 arXiv / web RAVEN is a multi-task retrieval-augmented vision-language model framework that improves performance on various tasks through efficient fine-tuning without requiring additional parameters.
3D Human Reconstruction and Health CVML
	Development and Validation of an Accurate Smartphone Application for Measuring Waist-to-Hip Circumference Ratio Siddharth Choudhary, Ganesh Iyer, Brandon M. Smith, Jinjin Li, Mark Sippel, Antonio Criminisi, Steven B Heymsfield Nature Digital Medicine, 2023 (Shipped with Amazon Halo Body) pdf / web / citation Propose a CNN based model called MeasureNet for accurately and reliably predicting body measurements and waist-hip ratio. Model is trained using realistic synthetic dataset.
	SplatArmor: Articulated Gaussian splatting for animatable humans from monocular RGB videos Rohit Jena, Ganesh Iyer, Siddharth Choudhary, Brandon M. Smith, Pratik Chaudhari, James C. Gee, arXiv, 2023 arXiv / web A fully articulated Gaussian splatting model for human avatars. Our model includes both rigid and non-rigid skinning components, and a Neural Color Field for implicit color regularization.
	Mesh Strikes Back: Fast and Efficient Human Reconstruction from RGB videos Rohit Jena, Pratik Chaudhari, James C. Gee, Ganesh Iyer, Siddharth Choudhary, Brandon M. Smith arXiv, 2023 arXiv Optimizing a SMPL+D mesh and an efficient, multi-resolution texture representation for novel view synthesis and pose synthesis.
	Multi-task Transformer for Real-Time Fitness Activity Understanding from Videos Siddharth Choudhary, Brandon M. Smith, Jinjin Li, Yaar Harari, Abhishek Dubey Amazon Computer Vision Conference, 2023 (Best Paper Award) PDF (on request) A two-stage transformer-based system analyzes real-time fitness videos using pose sequences to detect exercise repetitions, classify movements, and identify form errors with severity levels.
Augmented Reality
	Multiuser, Scalable 3D Object Detection in the AR Cloud Siddharth Choudhary, Nitesh Sekhar, Siddharth Mahendran Prateek Singhal CVPR Workshop on AR/VR, 2020 (Shipped on Magic Leap One) pdf / project page / bibtex An approach for multiuser and scalable 3D object detection, based on distributed data association and fusion.
Distributed Object SLAM
	Data-Efficient Decentralized Visual SLAM Titus Cieslewski, Siddharth Choudhary, Davide Scaramuzza ICRA, 2018 pdf / video / code / bibtex Decentralized visual SLAM combining distributed PGO using Gauss-Seidel and efficient, distributed place recognition using NetVLAD features.
	Distributed Object based SLAM Siddharth Choudhary PhD Thesis, 2017 pdf / ppt Decentralized object based SLAM combining distributed PGO with joint modeling and mapping of object landmarks.
	Distributed Mapping with Privacy and Communication Constraints: Lightweight Algorithms and Object-based Models Siddharth Choudhary, Luca Carlone, Carlos Nieto, John Rogers, Henrik I. Christensen, Frank Dellaert IJRR, 2017 arxiv / code / video / bibtex Propose a distributed implementation of the two-stage pose graph optimization, using Successive Over-Relaxation (SOR) and the Jacobi Over-Relaxation (JOR) as workhorses to split the computation among the robots. Extends it to work with object-based map models.
	Multi Robot Object-based SLAM Siddharth Choudhary, Luca Carlone, Carlos Nieto, John Rogers, Zhen Liu, Henrik I. Christensen, Frank Dellaert ISER, 2016 pdf bibtex Multi robot SLAM approach that uses 3D objects as landmarks for localization and mapping. Leverages local computation at each robot (e.g., object detection and object pose estimation) to reduce the communication burden.
	Distributed Trajectory Estimation with Privacy and Communication Constraints: a Two-Stage Distributed Gauss-Seidel Approach Siddharth Choudhary, Luca Carlone, Carlos Nieto, John Rogers, Henrik I. Christensen, Frank Dellaert ICRA, 2016 pdf / ppt / www / video / bibtex Leverages recent results which show that the maximum likelihood trajectory is well approximated by a sequence of two quadratic subproblems and solves it in a distributed manner, using the distributed Gauss-Seidel (DGS) algorithm.
	Active planning based extrinsic calibration of exteroceptive sensors in unknown environments Varun Murali, Carlos Nieto, Siddharth Choudhary, Henrik I. Christensen IROS, 2016 pdf / code Plans a trajectory which actively reduces the uncertainty of the robot's calibration given a rough initial calibration estimate.
	Exactly Sparse Memory Efficient SLAM using the Multi-Block Alternating Direction Method of Multipliers Siddharth Choudhary, Luca Carlone, Henrik I. Christensen, Frank Dellaert IROS, 2015 pdf / code / bibtex Scalable SLAM using multiblock Alternating Direction Method of Multipliers (ADMM).
	Information-based Reduced Landmark SLAM Siddharth Choudhary, Vadim Indelman, Henrik I. Christensen, Frank Dellaert ICRA, 2015 pdf / bibtex Information theoretic algorithm to efficiently reduce the number of landmarks and poses in a SLAM estimate without compromising the accuracy of the estimated trajectory.
	SLAM with Object Discovery, Modeling and Mapping Siddharth Choudhary, Alexander J. B. Trevor, Henrik I. Christensen, Frank Dellaert IROS, 2014 pdf / bibtex Propose an approach for online object discovery and object modeling, and extend a SLAM system to utilize these discovered and modeled objects as landmarks to help localize the robot in an online manner.
Structure from Motion / GPU Computing
	CPU and/or GPU: Revisiting the GPU Vs. CPU Myth Kishore Kothapalli, Dip Sankar Banerjee, P. J. Narayanan, Surinder Sood, Aman Kumar Bahl, Shashank Sharma, Shrenik Lad, Krishna Kumar Singh, Kiran Matam, Sivaramakrishna Bharadwaj, Rohit Nigam, Parikshit Sakurikar, Aditya Deshpande, Ishan Misra, Siddharth Choudhary, Shubham Gupta arXiv, 2013 arXiv
	Geometry directed browser for personal photographs Aditya Deshpande, Siddharth Choudhary, PJ Narayanan, Krishna Kumar Singh, Kaustav Kundu, Aditya Singh, Apurva Kumar ICVGIP, 2012 pdf / bibtex
	Improving the Efficiency of SfM and its Applications Siddharth Choudhary MS Thesis, 2012 pdf / www
	Visibility Probability Structure from SfM Datasets and Applications Siddharth Choudhary, P. J. Narayanan ECCV, 2012 pdf / www / bibtex Encode the visibility information between and among points and cameras as visibility probabilities for improved localization and triangulation.
	Practical time bundle adjustment for 3d reconstruction on the gpu Siddharth Choudhary, Shubham Gupta, P. J. Narayanan ECCV CVGPU Workshop, 2010 pdf / ppt / tutorial Hybrid implementation of sparse bundle adjustment on the GPU using CUDA, with the CPU working in parallel.