Task-to-Instance Prompt Learning for Vision-Language Models at Test Time

  • Zhihe Lu
  • , Jiawang Bai
  • , Xin Li
  • , Zeyu Xiao
  • , Xinchao Wang*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Prompt learning has been recently introduced into the adaption of pre-trained vision-language models (VLMs) by tuning a set of trainable tokens to replace hand-crafted text templates. Despite the encouraging results achieved, existing methods largely rely on extra annotated data for training. In this paper, we investigate a more realistic scenario, where only the unlabeled test data is available. Existing test-time prompt learning methods often separately learn a prompt for each test sample. However, relying solely on a single sample heavily limits the performance of the learned prompts, as it neglects the task-level knowledge that can be gained from multiple samples. To that end, we propose a novel test-time prompt learning method of VLMs, called Task-to-Instance PromPt LEarning (TIPPLE), which adopts a two-stage training strategy to leverage both task- and instance-level knowledge. Specifically, we reformulate the effective online pseudo-labeling paradigm along with two tailored components: an auxiliary text classification task and a diversity regularization term, to serve the task-oriented prompt learning. After that, the learned task-level prompt is further combined with a tunable residual for each test sample to integrate with instance-level knowledge. We demonstrate the superior performance of TIPPLE on 15 downstream datasets, e.g., the average improvement of 1.87% over the state-of-the-art method, using ViT-B/16 visual backbone. Our code is open-sourced at https://github.com/zhiheLu/TIPPLE.

Original languageEnglish
Pages (from-to)1908-1920
Number of pages13
JournalIEEE Transactions on Image Processing
Volume34
DOIs
Publication statusPublished - 14 Mar 2025

Keywords

  • Adaptation models
  • Automobiles
  • Dogs
  • Entropy
  • Image recognition
  • Learning systems
  • Prompt learning
  • Task-to-instance
  • Test-time learning
  • Training
  • Training data
  • Vectors
  • Vision-language models
  • Visualization

Fingerprint

Dive into the research topics of 'Task-to-Instance Prompt Learning for Vision-Language Models at Test Time'. Together they form a unique fingerprint.

Cite this