Recent foundation models (e.g., DALL-E, CLIP, GPT-3, ChatGPT, ...) broke new ground and brought excitement in natural language understanding, computer vision, and various other domains. Would foundation models enable intelligent decision making for robots operating in the physical world and how?