About me

Hi, there! This is Avani Gupta. Welcome to my web-page.

A brief Introduction of my journey so far: I worked on AI research and development in building competent LLMs from scratch, improvising on not only data but model architecture. One of the models I worked on Med42: A medical LLM, is already released on huggingface. I researched on novel attention mechanisms for LLMs and proposed one in my current work. I have been handling model architecture development as well as synthetic data generation pipelines for pre-training and supervised fine-tuning. I have been involved in end-to-end training of LLMs: including pre-training data and scripts using deepspeed, supervised fine-tuning, DPO and RLHF. I also developed a real-time advanced RAG system for chat support, enhancing customer interaction capabilities as an AI Engineer. I built agents with various tools including a custom retrieval agent with advanced multi-modal retrieval, utilizing RAPTOR for document summaries and self-re-RAG. I have tackled multiple challenges and built several additional components to ensure a robust system, such as content moderation and jailbreak attempt flagging.

I worked on ML interpretability during my master’s at IIIT Hyderabad and presented my paper titled “Concept Distillation: Leveraging Human-Centered Concepts for Model Improvement” in Neurips, 2023. I also published my work in ICVGIP and got the oral and best paper award. I have been involved in several projects with my advisor, where I contribute to brainstorming and mentoring efforts. These projects focus on interpretability for mitigating backdoor attacks and style transfer using interpretable concepts. Prior to Interpretability, I worked on 3D human models and NeRF line of work in my lab CVIT.

I have also worked with IBM Research on Reinforcement Learning in business processes, where I patented and published two works. With IBM Research, I created AI models for forecasting the next steps in business processes and optimizing various KPI goals for the organization.

I am always keen to apply my skills and gather knowledge while working on interesting and novel problems. I have published my research in top AI conferences like NeurIPS, COLING, and CODS-COMAD, and I have experience in NLP, Computer Vision, and Reinforcement Learning, being a fast learner.

News

December, 2023: Med42 released on huggingface!
Sepetember 22, 2023: Paper “Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement” accepted in Neurips 2023!
September 30, 2023: Successfully defended my Master’s thesis.
September 12, 2023: Paper “Predicting Business Process Events in Presence of Anomalous IT Events” accepted in CODS-COMAD, 2024.
December 10, 2022: Received the Oral and Best Paper Award for the paper titled “Interpreting Intrinsic Image Decomposition using Concept Activations” in ICVGIP, 2022.

Work experience

April 2024 - Present: AI Engineer
- Stealth AI Startup | Abu Dhabi, UAE
  - Trained a LLM end to end from scratch for advancing SOTA: from data curation to pre-training to post-training (Supervised Fine-Tuning and Alignment using RLHF)
  - Designed LLM with novel attention mechanism, pipelines for data generation.
  - Built an AI Assistant with various tools including advanced multi-modalretrieval, utilizing RAPTOR for document summaries and self-re-RAG (using LangChain, FastAPI)
  - Productionized the AI Assistant with Azure OpenAI and Google cloud.
  - Tackled multiple challenges in the AI assistant and built components like content moderation and jailbreak attempts flagging to ensure a robust deployed system.
March 2023 - March 2024: Research Associate
- G42 Healthcare | Abu Dhabi, UAE
  - Trained a foundation model from scratch in a novel setting to predict procedures, diagnosis and medications for patients given medical history and demographics.
  - Used it for chronic disease identification, mortality prediction,re-admission prediction and personalised medicine.
  - Orchestrated training dataset (from 10M+ articles) and evaluation of Clinical LLM. Outcomes: Med42 released on HuggingFace and authored paper.
Sep 2022- Dec 2022: Research Intern
- IBM Research | Bangalore, India
  - Worked on forecasting and handling IT errors in Business Processes: paper
May 2021- Dec 2021: Research Intern
- IBM Research | Bangalore, India
  - Research project on building system for Goal Oriented Next Best Action Prediction in Business Processes using Deep Reinforcement Learning.
  - Submitted Paper and US. patent (currently in last stage after signing)
May 2020 - March 2023: Researcher
- CVIT, IIIT Hyderabad
  - Worked on ML Interpretability applied in Computer Vision and Graphics under Professor P.J. Narayan.
  - Developed novel interpretability based model evaluation and training methods.
  - Used human centered abstract concepts for model disentanglement evaluation and finetuning via a proposed loss function.
  - Concepts helped to align model with human understanding thereby improving model generalization.
  - Used concepts to debias for complex biases like age in gender classification and induce prior knowledge in a real-world reconstruction problem
  - Also worked in Neural rendering, ray tracing and 3D reconstruction of objects and scenes. Studied NeRF(Neural Radiance fields) line of work. Implemented and reproduced results of several papers in neural rendering.
Jan 2020- Jan 2021: Independent Study Researcher
- CVIT, IIIT Hyderabad
  - Worked on Realistic Human Body Reconstructions and Digital Humans and temporal stability over 3D animations with Professor Avinash Sharma.
June - July 2020: Crew Member and Mentee
- Microsoft | Mars Colonization Program
  - Worked on Automated mars rover web game.
  - Developed the game in Agent Centric way.
  - Used shortest path-finding algorithms like Collaborative Learning Agents, A, Dijkstra, Best first search, IDA, Jump-Point Finders and their bi-directional forms to make the AI rover navigate the mars.
  - Applied Travelling salesmen algorithm and made the AI agent render multiple destinations in the shortest path avoiding all obstacles. *Built using Object Oriented programming cocepts. Used Jquery, Rafael.js, and HTML, CSS and javascript.
Jan 2020- May 2020: Applied Deep Learning and Software Engineering Intern
- Scrapshut | Hyderabad
  - Developed a web-app using Angular and Django where users can check genuineness of any site by providing it’s URL and get other user’s reviews along with predictions by DL model.
  - Trained various Deep Learning models like LSTM, XGBoost and CNN on three datasets- Kaggle fake news net, Kaggle: getting real about fake news and Kaggle fake news Prediction.
  - Also trained a passive aggressive classifier (online learning algorithm) and incorporated user-rated scraped reviews for real time prediction.
Nov 2019- Jan 2020: RL Researcher
- Robotics Research Centre
  - Worked on several SOTA RL algorithms in Robotics and Control under Professor Madhav Krishna.
  - Implemented algorithms from Monte-Carlo to PPO, TRPO, DDPG etc from scratch.
  - Also used open AI gym, RLib, Vowpall wabbit and engines like Gazebo, Mojuco for control in robotics.

Avani Gupta

News

Work experience

Stealth AI Startup | Abu Dhabi, UAE

G42 Healthcare | Abu Dhabi, UAE

IBM Research | Bangalore, India

IBM Research | Bangalore, India

CVIT, IIIT Hyderabad

CVIT, IIIT Hyderabad

Microsoft | Mars Colonization Program

Scrapshut | Hyderabad

Robotics Research Centre