AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System

Vision-based teleoperation offers the possibility to endow robots with human-level intelligence to physically interact with the environment, while only requiring low-cost camera sensors. However, current vision-based teleoperation systems are designed and engineered towards a particular robot model and deploy environment, which scales poorly as the pool of the robot models expanded and the variety of the operating environment increases.

A 9.7fJ/Conv.-Step Capacitive Sensor Readout Circuit with Incremental Zoomed Time Domain Quantization

This paper presents a capacitive sensor readout circuit with an incremental zoom current-controlled oscillator (CCO)-based timedomain (TD) ΔΣM. It supports single-sensor measurement, leading to 2x savings in sensing hardware compared to dual-sensor schemes. A low-cost 5b TD-ΔΣM is implemented with a 7- stage ring-CCO and a double-PFD (DPFD) quantizer. To further boost its speed, a fast start-up scheme is designed for the CCO quantizer.

RVT: Robotic View Transformer for 3D Object Manipulation

For 3D object manipulation, methods that build an explicit 3D representation perform better than those relying only on camera images. But using explicit 3D representations like voxels comes at a large computing cost, adversely affecting scalability. In this work, we propose RVT, a multi-view transformer for 3D manipulation that is both scalable and accurate. Some key features of RVT are an attention mechanism to aggregate information across views and re-rendering of the camera input from virtual views around the robot workspace.

DeePattern: Layout Pattern Generation with Transforming Convolutional Auto-Encoder

VLSI layout patterns provide critical resources in various design for manufacturability research, from early technology node development to back-end design and sign-off flows. However, a diverse layout pattern library is not always available due to long logic-to-chip design cycle, which slows down the technology node development procedure. To address this issue, in this paper, we explore the capability of generative machine learning models to synthesize layout patterns.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes. In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom.

"This is my unicorn, Fluffy": Personalizing frozen vision-language representations

Large Vision & Language models pretrained on web-scale data provide representations that are invaluable for numerous V&L problems. However, it is unclear how they can be used for reasoning about user-specific visual concepts in unstructured language. This problem arises in multiple domains, from personalized image retrieval to personalized interaction with smart devices. We introduce a new learning setup called Personalized Vision & Language (PerVL) with two new benchmark datasets for retrieving and segmenting user-specific “personalized” concepts “in the wild”.

DeXtreme: Transfer of Agile In-Hand Manipulation from Simulation to Reality

Recent work has demonstrated the ability of deep reinforcement learning (RL) algorithms to learn complex robotic behaviours in simulation, including in the domain of multi-fingered manipulation. However, such models can be challenging to transfer to the real world due to the gap between simulation and reality. In this paper, we present our techniques to train a) a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand and b) a robust pose estimator suitable for providing reliable real-time information on the state of the object being manipulated.