Hi, I'm Manish

I graduated with a BS in Computer Science from Kathmandu University where I was advised by Prof. Dr. Bal Krishna Bal. As of now, my research interests include, but are not limited to:

  • > LLM Post-Training: RL is supposed to elicit reasoning in LLMs by expanding inference-time compute. However, this training process is only limited to verifiable domains (code, math) whereas the majority of tasks that justify productivity are not verifiable. Does RL help here? Do we need a different paradigm? I intend to explore this.
  • > Multi-agent: Current research focuses on building agents that are good at tool calling, but still tools are called more often than required. When should LLMs simply use their parametric knowledge and when use the tools? Agents are used only during inference time. Can we orchestrate multi-agents for training tasks like these?

Publications

  • Nepali Encoder Transformers [PDF]
    Proceedings of SIGUL2022 @ LREC2022
    Utsav Maskey, Manish Bhatta, Shiva Raj Bhatta, Sanket Dhungel, Bal Krishna Bal

Projects

  • LilLM [github]
    > pre-training 39M parameter and supervised fine-tuning Llama like model from scratch

  • RL-GRPO [github]
    > distributed training (FSDP, tensor parallelism) of qwen model using GRPO algorithm under 1000 LOC

  • TinyOpenHands [github]
    > Tiny from-scratch reproduction of Open Hands/OpenDevin agent

Blogs

[See more]

Socials

[Twitter] [Github]