Hi !
I am Anooshka Bajaj

Data Scientist, Machine Learning & AI Researcher

download resume

about me

I am a graduate student at Indiana University Bloomington pursuing an M.S. in Data Science
and have completed my B.Tech. in Bio-Engineeering
with Minor in Computer Science Engineering
from IIT Mandi in 2023.

I currently work in the Cognitive AI Lab at Indiana University Bloomington, where my research focuses on mechanistic interpretability, in-context learning, and AI safety and alignment in large language models. My work aims to better understand model behavior and improve the reliability of modern AI systems.

Previously, I worked as a Data Science Intern at TGS, where I developed and finetuned foundation models for large scale geological datasets. Prior to this, I worked as a full time Software Engineer at Radisys.

Talk to me about AI, Rationality and EA, and I'd be happy!

Name

Anooshka Bajaj

Address

Bloomington, IN United States

Email

anooshkabajaj@gmail.com

profile_image

education

2015 - 2017

Secondary School

Chandigarh, India

CGPA: 10.00

2017 - 2019

Senior Secondary School

Chandigarh, India

Non-Medical: Physics, Chemistry,
Maths
Percentage: 90.20%

2019 - 2023

Bachelor of Technology (B.Tech.)

Himachal Pradesh, India

B.Tech. in Bio-Engineering
with Minor in Computer Science Engineering
CGPA: 8.50 / 10.00

2024 - 2026

Master of Science (M.S.)


Bloomington, IN, United States

M.S. in Data Science
GPA: 3.93 / 4.00

experience

  • May 2025 - Aug 2025

    Data Science Intern

    Houston, TX, United States

    Developed and optimized a large-scale machine learning foundation model for geological data.
    Improved the model's architecture and fine-tuned it on industry datasets, enabling more accurate predictions for subsurface formation analysis.

  • Bloomington, IN, United States

    Working in the Cognitive AI lab under , focusing on in-context learning mechanisms in Transformers. Investigating temporal dynamics of attention heads in Large Language Models, drawing parallels to human memory.

    Sep 2024 - present

    Research Assistant

  • July 2023 - June 2024

    Software Engineer

    Bangalore, Karnataka, India

    Worked with the OAM (Operations, Administration, and Maintenance) team to improve the 5G Network infrastructure.
    Handled requests across various modules like
    CM (Configuration Management): Added new configurations in YANG model according to 3GPP specifications and
    PM (Performance Management): Handled performance counters by modifying the codebase in C++ and Python.

  • Indore, Madhya Pradesh, India (remote)

    Used Machine Learning Algorithms to develop models on existing data to help predict used car prices and determine whether a used car is worth the posted price.
    Preprocessed the large dataset in Python and performed EDA. Used BigQuery for creating data warehouse and Google Cloud Composer for managing ETL workflow.

    July 2022 - Sep 2022

    Data Engineer Intern

  • June 2022 - July 2022

    Research Intern

    Bhopal, Madhya Pradesh, India

    Worked under the mentorship of .
    Studied target genes of YAP transcription factor in
    Hippo Signaling Pathway in order to inhibit tumor growth.
    Implemented web scraping in Python and did
    data visualization of the target genes.
    Also wrote a python script for cell migration analysis and deployed the web application using Flask.

  • Mandi, Himachal Pradesh, India

    Worked under the mentorship of on this Computational Biology Project.
    Designed indices to increase throughput of
    Illumina sequencing.
    Wrote a Perl script for the sequence of barcodes and did quality analysis for error reduction.

    July 2021 - Nov 2021

    Research Project

Publications

Emergence of Episodic Memory in Transformers: Characterizing Changes in Temporal Structure of Attention Scores During Training

Deven Mahesh Mistry, Anooshka Bajaj, Yash Aggarwal, Sahaj Singh Maini, Zoran Tiganj

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025)

Pages: 8893–8911 | Albuquerque, New Mexico

Abstract:
We investigate in-context temporal biases in attention heads and transformer outputs. Using cognitive science methodologies, we analyze attention scores and outputs of the GPT-2 models of varying sizes. Across attention heads, we observe effects characteristic of human episodic memory, including temporal contiguity, primacy and recency. Transformer outputs demonstrate a tendency toward in-context serial recall. Importantly, this effect is eliminated after the ablation of the induction heads, which are the driving force behind the contiguity effect. Our findings offer insights into how transformers organize information temporally during in-context learning, shedding light on their similarities and differences with human memory and learning.

Citation (APA):
Mistry, D. M., Bajaj, A., Aggarwal, Y., Maini, S. S., & Zoran Tiganj. (2025). Emergence of Episodic Memory in Transformers: Characterizing Changes in Temporal Structure of Attention Scores During Training. ACL Anthology, 8893–8911. https://aclanthology.org/2025.naacl-long.448/

Beyond Semantics: How Temporal Biases Shape Retrieval in Transformer and State-Space Models

Anooshka Bajaj, Deven Mahesh Mistry, Sahaj Singh Maini, Yash Aggarwal, Zoran Tiganj

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2026)

Pages: 7562–7581 | Rabat, Morocco

Abstract:
In-context learning depends not only on what appears in the prompt but also on when it appears. To isolate this temporal component from semantic confounds, we construct prompts with repeated anchor tokens and average the model’s predictions over hundreds of random permutations of the intervening context. This approach ensures that any observed position-dependent effects are driven purely by temporal structure rather than token identity or local semantics. Across four transformer LLMs and three state-space/recurrent models, we observe a robust serial recall signature: models allocate disproportionate probability mass to the tokens that previously followed the anchor, but the strength of this signal is modulated by serial position, yielding model-specific primacy/recency profiles. We then introduce an overlapping-episode probe in which only a short cue from one episode is re-presented; retrieval is reliably weakest for episodes embedded in the middle of the prompt, consistent with "lost-in-the-middle" behavior. Mechanistically, ablating high-induction-score attention heads in transformers reduces serial recall and episodic separation. For state-space models, ablating a small fraction of high-attribution channels produces analogous degradations, suggesting a sparse subspace supporting induction-style copying. Together, these results clarify how temporal biases shape retrieval across architectures and provide controlled probes for studying long-context behavior.

Citation (APA):
Bajaj, A., Mistry, D. M., Maini, S. S., Aggarwal, Y., & Tiganj, Z. (2025). Beyond Semantics: How Temporal Biases Shape Retrieval in Transformer and State-Space Models. arXiv preprint arXiv:2510.22752. https://aclanthology.org/2026.eacl-long.355/

Temporal Dependencies in In-Context Learning: The Role of Induction Heads

Anooshka Bajaj, Deven Mahesh Mistry, Sahaj Singh Maini, Yash Aggarwal, Billy Dickson, Zoran Tiganj

arXiv preprint

Abstract:
Large language models (LLMs) exhibit strong in-context learning capabilities, but how they track and retrieve information from context remains underexplored. Drawing on the free recall paradigm in cognitive science (where participants recall list items in any order), we show that several open-source LLMs consistently display a serial-recall-like pattern, assigning peak probability to tokens that immediately follow a repeated token in the input sequence. Through systematic ablation experiments, we show that induction heads, specialized attention heads that attend to the token following a previous occurrence of the current token, play an important role in this phenomenon. Removing heads with a high induction score substantially reduces the +1 lag bias, whereas ablating random heads does not reproduce the same reduction. We also show that removing heads with high induction scores impairs the performance of models prompted to do serial recall using few-shot learning to a larger extent than removing random heads. Our findings highlight a mechanistically specific connection between induction heads and temporal context processing in transformers, suggesting that these heads are especially important for ordered retrieval and serial-recall-like behavior during in-context learning.

Citation (APA):
Bajaj, A., Mistry, D. M., Maini, S. S., Aggarwal, Y., Dickson B., & Tiganj, Z. (2026). Temporal Dependencies in In-Context Learning: The Role of Induction Heads. arXiv preprint arXiv:2604.01094. https://arxiv.org/abs/2604.01094

Who Do LLMs Trust? Human Experts Matter More Than Other LLMs

Anooshka Bajaj, Zoran Tiganj

arXiv preprint

Abstract:
Large language models (LLMs) increasingly operate in environments where they encounter social information such as other agents' answers, tool outputs, or human recommendations. In humans, such inputs influence judgments in ways that depend on the source's credibility and the strength of consensus. This paper investigates whether LLMs exhibit analogous patterns of influence and whether they privilege feedback from humans over feedback from other LLMs. Across three binary decision-making tasks, reading comprehension, multi-step reasoning, and moral judgment, we present four instruction-tuned LLMs with prior responses attributed either to friends, to human experts, or to other LLMs. We manipulate whether the group is correct and vary the group size. In a second experiment, we introduce direct disagreement between a single human and a single LLM. Across tasks, models conform significantly more to responses labeled as coming from human experts, including when that signal is incorrect, and revise their answers toward experts more readily than toward other LLMs. These results reveal that expert framing acts as a strong prior for contemporary LLMs, suggesting a form of credibility-sensitive social influence that generalizes across decision domains.

Citation (APA):
Bajaj, A., & Tiganj, Z. (2026). Who Do LLMs Trust? Human Experts Matter More Than Other LLMs. arXiv preprint arXiv:2602.13568. https://arxiv.org/abs/2602.13568

Media Coverage

New Cohort of Advanced Cyberinfrastructure Student Fellows

Indiana University IT News • Nov 12, 2025

Selected as one of six students across Indiana University for the Advanced Cyberinfrastructure Student Fellowship, supporting my research on interpretability, in-context learning, and confidence alignment in large language models, focusing on analyzing and understanding internal model behavior.

Data Science Research Cyberinfrastructure

IU Luddy Autonomous Race Team Thrives in Latest IAC Test

IU Luddy School of Informatics News • Aug 14, 2025

Highlighted for contributions to the IU Luddy Autonomous Race Team, which placed fourth in the Indy Autonomous Challenge at WeatherTech Raceway Laguna Seca. Contributed to race analysis by developing a tool to integrate and interpret high frequency multi sensor autonomous vehicle data on racetrack maps.

Autonomous Racing Data Analysis Robotics

Challenges, Fun Highlight Computer Vision Course Final Project Poster Session

IU Luddy School of Informatics News • May 16, 2025

Featured for involvement in the Luddy computer vision course through a research driven project on automated detection of bacterial flagellar motors in electron tomography data. The project focused on applying computer vision methods to accelerate biological discovery by automating a traditionally manual annotation process.

Computer Vision Education ML

Grace Hopper Celebration delivers 'magical' experience for Luddy students

IU Luddy School of Informatics News • Nov 17, 2025

Selected to attend the Grace Hopper Celebration (GHC) in Chicago, the world's largest gathering of women in computing and engineering. Participated in the four day conference alongside IU Luddy students, connecting with thousands of women in technology, attending sessions and workshops, and networking with leaders shaping the future of tech.

Grace Hopper Conference Networking

Positions of Responsibility


Sports Secretary
Sports Secretary

Aug '19 - June '20
Gauri Kund Hostel, IIT Mandi


Coordinator - Dance Club
Coordinator - Dance Club

Aug '20 - June '21
Uhl Dance Crew, IIT Mandi


Coordinator - Table Tennis
Coordinator - Table Tennis Team

Aug '20 - June '21
IIT Mandi


Fest Team Member
Planning and Publicity Team
in Rann Neeti and Exodia

Sep '19 - Apr '20
IIT Mandi

Projects


Google ADK, Vertex AI, FastAPI


  • Built an for autonomous racing telemetry analysis using Google ADK and Vertex AI on GCP, developing agents that answer natural language queries over high frequency sensor data with a FastAPI backend and React dashboard.
  • Designed a multi-agent system with three specialized agents for telemetry data discovery, natural language race analysis, and automated visualization, containerized with Docker and deployed on Google Cloud Run.



  • RAG, LLMs, LangChain, FastAPI


  • LLM-powered Retrieval Augmented Generation (RAG) system for enterprise operations, enabling citation-backed question answering over incident response and customer support documents.
  • Ingested and indexed multiple enterprise documents using LangChain and FAISS for semantic search and context-aware retrieval.
  • This FastAPI service supports document ingestion, vector indexing, retrieval, and evaluation, with Dockerized deployment for scalable enterprise use.



  • Deep Learning, CNN


  • to predict and classify the type of skin cancer using dermatoscopic images uploaded by users.
  • CNN model trained on the HAM10000 dataset, which includes 10,015 dermatoscopic images spanning 7 diagnostic categories.
  • This application enables users to upload images of their skin for analysis, providing reliable disease identification and aiding users in understanding their skin health and promoting timely medical consultations and interventions.



  • NoSQL, MongoDB, PowerBI, React, Node.js


  • Designed and deployed a full-stack MERN for event booking, with role-based access control enabling organizers to list events and attendees to book tickets.
  • Utilized MongoDB as the primary database, implementing indexed schemas across three collections (events, users, and tickets) to reduce data retrieval time through efficient NoSQL document structuring, and integrated it with Power BI to for real-time data visualization.
  • Built an interactive React interface with four main components and a Node.js backend, hosted on Render.



  • PowerBI, SQL


  • Interactive Power BI dashboard for analyzing on-time performance (OTP) of the Indianapolis bus transit system, built using PostgreSQL data and SQL queries with geospatial mapping.
  • Analyzed multi-year operational data to evaluate OTP trends, peak vs. off-peak performance, and route-level reliability across different service coverage types.
  • This dashboard enables users to track bus status and delay frequency through interactive filters and maps, supporting operational decision-making and improving passenger transparency.



  • Computer Vision, Object Detection, YOLO


  • Computer vision benchmarking study for detecting small biological structures in cryo-electron tomographic data using YOLO-based object detection models.
  • Benchmarked multiple YOLOv8 and YOLOv9 variants on cryo-electron tomograms to analyze performance trade-offs across model size, efficiency, precision, and recall, identifying the most effective architecture for small-object detection.
  • Mapped detections across slice depths to visualize spatial consistency of predictions, enabling reliable 3D localization of flagellar motors and reducing manual annotation effort in cryo-ET data.



  • Python, Flask


  • Anime Recommender .
  • Implemented Collaborative Filtering based Recommendation System for finding similar anime to user's preference.
  • Also implemented Popularity based Recommendation System for finding top 100 anime based on user ratings.



  • GenAI, NLP, Streamlit


  • Audio summarization for converting long-form audio content into concise text summaries.
  • Built a Streamlit-based application to transcribe audio files using OpenAI's Whisper and generate abstractive summaries using a transformer-based language model.
  • This application supports common audio formats and enables users to quickly extract key insights from podcasts, talks, and recorded discussions.



  • Machine Learning, NLP


  • Sentiment Analysis done on tweets mentioning ChatGPT shortly after its market release to identify public sentiment.
  • Data trained on different models, and hyperparameter tuned SVM resulted in the best model accuracy.



  • R, ANOVA


  • Analyzed a cholesterol level LDL measurement dataset in R using two-way ANOVA and linear regression.
  • Conducted a post-hoc test and created a scatter plot of residuals to investigate the effects of medical drugs and lifestyle factors.


  • Achievements


    Flipkart GRiD 3.0
    Hackathon Level 3

    Among the top 31 teams all over India to be qualified for level 3 in Smart Bag Creator Challenge


    Inter IIT Sports Meet
    1st Runner Up

    Bagged Silver Medal at Inter IIT Sports Meet Delhi 2023


    Dance Competitions
    Winner

    Won several intra college dance competitons including Breakout and Exuberance


    Table Tennis
    Nationals

    Represented Chandigarh in Sub Junior TT Nationals at Cochin and Gandhidham. Represented Punjab, Haryana, HP and Chd Directorate at the NCC National Games at Delhi.

    contact me

    Anooshka Bajaj

    Address:

    Bloomington, IN, United States

    Email:

    anooshkabajaj@gmail.com

    LinkedIn:

    anooshkabajaj