Artificial Intelligence (AI) has made remarkable progress in tasks like vision recognition, natural language understanding, and video generation, largely due to Deep Learning. Transformer models, with their self-attention mechanism, have revolutionized Natural Language Processing (NLP) and Computer Vision (CV). However, their quadratic compute and memory complexity pose challenges for deployment on resource-limited hardware and contribute to high environmental and computational costs. Efficient AI deployment demands reducing these overheads.
My first work introduces an adaptive Transformer architecture trained with progressive pruning, allowing dynamic token and attention head pruning to optimize accuracy-efficiency tradeoffs across NLP tasks without additional training. The second work extends this to Vision Transformers, dynamically pruning redundant image patches in Fine-Grained Visual Classification (FGVC) for efficiency. Lastly, a software-hardware co-design Processing in Memory (PIM) framework is proposed to support token-adaptive Transformers, maximizing throughput and efficiency while reducing data movement costs.
To add to calendar:
Click on: https://wordpress.cels.anl.gov/cels-seminars/
Enter your credentials.
Search for your seminar
Click “Add to calendar”