UT Arlington computer scientists used TACC systems to create synthetic objects for robot training.
William Beksi was an assistant professor in the Department of Computer Science and Engineering at the University of Texas at Arlington. He also founded the Robotic Vision Laboratory. Before joining the University of Texas at Arlington, he interned at iRobot. This company is the largest consumer robot manufacturer (mainly via its Roomba robot vacuum).
Robots must be able to detect and make decisions about how they interact with built environments to navigate them. The company’s researchers were interested in deep and machine learning to teach their robots about objects. However, this requires large amounts of images. Although many photos and videos of rooms have been taken from the perspective of robotic vacuums, not one was shot from this perspective. Versus training with images from a human-centric perspective failed.
Beksi’s research is focused on robotics and computer vision. He said, “I’m interested in developing algorithms that enable machines to learn from their interactions and autonomously acquire the skills necessary for high-level tasks.”
Beksi, now a researcher with six Ph.D. students in computer science, recalled the Roomba problem years later and began exploring possible solutions. Some people use a manual approach involving a 360-degree camera to capture the environment (including Airbnb houses) and custom software for stitching them together. Beksi thought manual capture would not work.
Instead, he turned to generative adversarial networks, or GANs, a form of deep learning that allows two neural systems to compete in a game until the “generator” of new data fools the “discriminator.” This would allow for the creation of infinite rooms or outdoor environments with different types of chairs, tables, and vehicles, but still, to humans and robots, identifiable objects with recognizable dimensions.
He explained that you can “perturb these objects, move them in new positions, use different colors, texture, and lights to render them into a training picture that could be used as a dataset.” This approach could yield unlimited data that can be used to train a robot.
Mohammad Samiul Arshad is a graduate student from Beksi’s research group. He said that manually designing the objects would require a lot of time and resources, but the generative networks could make them in seconds.
Generate objects for synthetic scenes.
Beksi was frustrated by his initial attempts to create photorealistic full scenes. We took a step back to review current research and determined how to start at more minor scales – creating simple objects in environments.
Beksi, Arshad, and a PCGAN were the first conditional adversarial network to create a dense colored point cloud in an unsupervised mode at the International Conference on 3D Vision (3DV) in November 2020. Their paper, “A Progressive Conditional Generative Adversarial network for Generating Dense 3D Point Clouds,” demonstrates that their network can learn from a training set (derived from ShapeNetCore, a CAD model database) and replicate a 3D data distribution to create colored point clouds with fine detail at multiple resolutions.
He said, “There was some work to generate synthetic objects using these CAD model datasets.” “But, no one can yet handle color.”
Beksi’s group used a variety of shapes to test their method. They used tables, chairs, sofas, and airplanes in their experiments. Researchers can access almost infinite versions of the objects generated by deep learning.
He explained that the model starts with low-resolution information about an object’s basic structure and then gradually moves to more detailed information. The network also learns the relationship between objects and their colors, such as the legs of a chair/table being the same color but the seat/top contrasting. The network works with small objects and gradually builds to entire synthetic scene generation. This would be very useful for robotics.
They created 5,000 random samples and evaluated them using many different methods. They evaluated point cloud color and geometry using standard metrics from the field. The results revealed that PCGAN could synthesize high-quality point clouds for various object classes.
Sim2Real
Beksi also works on a second issue, known as Sim2real. Sim2real quantifies these differences and makes simulations more real by capturing the scene’s physics — friction, collisions, and gravity — and using ray and photon tracing.
Next, Beksi’s group will deploy the software on a robotic robot to see how it performs in the sim to real domain gap.
TACC’s Maverick 2 deep-learning resource allowed training the PCGAN model. Beksi accessed this resource through the University of Texas Cyberinfrastructure Research program (UTRC), which provides computing resources for researchers at any of the 14 institutions of the UT System.
He noted, “if you want to increase the resolution to include more points or detail, it will increase the computational cost.” “We don’t have the hardware resources to support my lab, so TACC was vital to accomplish that.”
Beksi needed extensive storage to store the data. He said that these datasets were massive, particularly the 3D point cloud. “We generate hundreds upon hundreds of megabytes per second. Each point cloud contains approximately 1 million points. This requires a lot of storage.
Beksi admits that robots are still far from fully autonomous and robust, but they would be a boon for many other areas, such as agriculture, manufacturing, and health care.