16-785 Integrated Intelligence in Robotics: Vision-and-Language Planning / Spring 2024

Time: Mondays, Wednesdays 11:00AM - 12:20PM

Location: Newell-Simon Hall (NSH) 3002

Office Hours: By Appointment

Canvas: Link




Course Description

This is a project-oriented seminar course that covers interdisciplinary topics on cognitive intelligence in robotic systems. Cognitive abilities constitute high-level, humanlike intelligence that exhibits reasoning or problem-solving skills. Such abilities as semantic perception, use of language, and task planning can be built on top of low-level robot autonomy. The topics covered generally bridge across multiple technical areas, for example, vision-language intersection and language-action/plan grounding.

The project theme in Spring 2024 is filmmaking that presents various robotics and machine learning challenges ranging from content generation such as scenario generation or scene/video synthesis/editing to robotics automation such as autonomous camera control or autonomous stop-motion control. The class will study both recent and seminal works in the field of intelligent robotics. The learning objectives of class projects will also put a special emphasis on research skills, e.g., problem formulation, literature review, ideation, evaluation planning, results analysis, and hypothesis verification. The course is discussion intensive, and thus attendance is required.

Spring 2024 Project: Filmmaking using AI & Robotics


Guillermo del Toro’s Pinocchio (2022), an Oscar nominated stop motion animation movie, took 15 years to make with a budget estimated over 35 million dollars. Another celebrated motion picture, Loving Vincent (2017), was fully hand painted by 152 artists and its production took 7 years, costing 5.5 million dollars. In this class, we will develop and experiment ideas to make such films using AI and robotics technologies. Course topics include language models, vision-language models, planning, and learning in the context of robotics.

Examples of class projects include painted movies (robot painting), stop motion movies (manipulation), animation (text to image/video generation), directing (direction following), and robotic cinematography. The final projects will be premiered at the end of the semester.

Pre-requisites

There are no explicit prerequisites for this class, but a general background knowledge in AI and machine learning is assumed.

Course Goals

In this course, we will strive to answer the following research questions and beyond towards the goal of developing cognitive capabilities on robots.

  • How can we make robots to perform tasks following natural language instructions?
  • How can we develop robots that can describe in natural language what they perceive through vision or explain what they are doing and why?
  • How can we fuse information coming in multiple modalities, e.g., language and images, to understand context-aware, semantic meanings of sensory data?
  • How do we measure the quality of information translated between different modalities, e.g., how do we measure the quality of language description given an image? What are the limitations and shortcomings of existing metrics?
  • How can we make use of semantic information digested from raw sensory inputs in the process of planning to solve a problem/task?
  • How do we measure the performance of computer vision algorithms outside benchmark datasets, e.g., on robots?
  • How should learned knowledge be stored? Do we need a universal representation for knowledge?
  • How can we make robots learn to improve over time, e.g., by learning new skills?

Using these research questions, we will learn to follow basic steps of conducting research through class projects.




Policies

Academic Integrity

We formally follow the guidelines in the CMU's academic integrity policy

Reasonable Person Principle (RPP)

We informally follow Reasonable Person Principle (RPP), a base culture of CMU's School of Computer Science, where everyone gives/gets the benefit of doubt for trying to be reasonable. The four rules of RPP are the following:

  • Everyone will be reasonable.
  • Everyone expects everyone else to be reasonable.
  • No one is special.
  • Do not be offended if someone suggests you're not being reasonable.

Extensions and Late Assignments

Each student will have up to 5 days of grace that can be used for any homework in whatever way without a penalty (Note that there will NOT be any extension for final project presentation and report). For example, you can use all of the 5 days for the first homework assignment, or split into 2 and 3 days to use for the first and the second assignments, respectively. After the 5 grace days have been used up, there will be no additional extensions; 50% will be deducted 1 day after a due date, and no points will be given after 2 days.




Previous Offerings


Instructors

Teaching Assistants