Abstract: Knowledge-based Visual Question Answering (VQA) is a challenging task that requires models to access external knowledge for reasoning. Large Language Models (LLMs) have recently been ...
Abstract: For long-horizon multi-task robotic manipulation, hierarchical approaches provide an effective way to combine high-level language-based task planning with low-level vision-language based sub ...