AI edits anything in videos, makes 3D videos, new AI beats GPT, AI for surgery, AI for GIS

Introduction

This past week has been a whirlwind of exciting developments in the realm of artificial intelligence. Among the plethora of breakthroughs, we have an open-source AI that edits videos just by using prompts, another AI that can transform images into consistent 3D videos, an AI robotic assistant capable of performing surgeries with precision, and innovative tools that can generate drum beats from simple taps. Plus, we've seen a tiny open-source model outperform top-tier models, including GPT-4 and Claude 3.5. Let’s delve deeper into these breakthroughs!

AutoV VFX: Revolutionizing Video Editing

Developed by the University of Illinois, AutoV VFX is an astonishing open-source AI tool that allows users to edit videos using simple prompts. This AI can add visual effects or insert objects into existing videos based purely on the user’s input. For instance, prompting it with “throw a basketball with fire towards the vase with flowers and break the vase with collision” results in a transformed output video depicting this exact scene. Another example includes melting the vase into liquid or inserting characters like Pikachu into the video.

AutoV VFX doesn't stop at adding characters; it can also manipulate existing objects within the video, such as resizing or changing textures. Its capabilities extend to more complex tasks, like generating animated cars in driving videos and performing advanced edits with motion.

The process involves three stages: scene modeling, prompt interpretation, and task execution. The good news? A portion of the code has been released on GitHub, allowing enthusiasts to install and run the model locally.

Dimension X: Creating 3D and 4D Scenes from Single Images

Another groundbreaking tool released this week is Dimension X, which generates 3D (or even 4D) scenes from a single image. This tool accurately extrapolates and fills in the missing data surrounding the subject in the image. It allows users to control camera movements, enabling zooms and pans, ultimately enhancing the viewing experience.

Dimension X excels in producing realistic 3D scenes from simple images, even allowing for 360° camera rotations and multiple perspectives from a single video, significantly aiding filmmakers and content creators in their projects.

Tria: The Rhythm AI

In the music realm, Tria has emerged as a game-changing tool. This AI allows users to upload two audio samples—one being a drum sound and the other a user-created beat (like a tap). Tria then maps the drum sounds onto the user’s rhythm, creating a complex and professional-sounding drum track. While the code for Tria hasn't been released yet, its demo showcases impressive musical capabilities.

Nvidia’s Addit: Image Editing Made Simple

Nvidia has introduced Addit, an innovative image editor that allows for modifications and additions to images based purely on prompts. For instance, you can add specific objects or features to an image seamlessly. Addit operates efficiently in multi-step processes, allowing users to layer changes progressively, making it easier to customize images while retaining context.

AI in Medicine: Surgical Robots

In a groundbreaking advance, researchers at Johns Hopkins have developed a surgical AI robot that learned surgical techniques by watching videos of the procedures. The robot demonstrated an ability to perform surgeries with a skill level equal to human doctors. This pioneering work in robotic surgery could pave the way for robots to assist or even replace human surgeons, potentially enhancing precision and reducing recovery times for patients.

NASA and Microsoft’s Earth Co-Pilot: Geospatial Intelligence

A collaboration between NASA and Microsoft has birthed Earth Co-Pilot, a tool designed for rapid geospatial data analysis. Rather than manually sifting through data sources, users can pose questions about various environmental parameters, like population density or air quality, and the AI retrieves the necessary information. This tool is currently available for NASA researchers, although future public accessibility remains uncertain.

Keywords

AutoV VFX
Dimension X
Tria
Nvidia Addit
Surgical AI
Earth Co-Pilot
Geospatial Intelligence
AI for video editing
3D scene generation
Robotics in surgery

FAQ

1. What is AutoV VFX? AutoV VFX is an open-source AI tool developed by the University of Illinois that allows video editing through simple prompts.

2. How does Dimension X work? Dimension X creates 3D and 4D scenes from a single image, allowing users to manipulate camera movements.

3. What does the Tria tool do? Tria allows users to upload drum sounds and create rhythm-based drum tracks based on beats made by the user.

4. How does Nvidia's Addit improve image editing? Addit enables users to modify images by adding new components via prompts, maintaining context and improving efficiency in the editing process.

5. What advancements have been made in surgical AI? Researchers at Johns Hopkins developed a surgical robot that learned procedures by watching videos, performing tasks as skillfully as human surgeons.

6. What is NASA's Earth Co-Pilot? Earth Co-Pilot is a tool developed to assist researchers in obtaining geospatial data through prompts, simplifying environmental analysis.