[updated] Something that could be used in robotics too

posted in DreamLand editor
Published December 15, 2023
Advertisement

I`m codding again. Soon I will have a new video or at least a screenshot with the new things I made. Meanwhile I want to talk about something related to what I posted recently on my other frequently visited forum. The post was titled “Building a better RTS AI. Starting from the end”. In that post I was describing how in order to solve a problem you need to start from the goal and work your way backwards towards the current state of things. This approach can be used not only in a RTS game but also in robotics. Here is an example. Lets say that the goal is to remove a chair from a pile of things. Removing a thing from somewhere means that the goal is to retrieve the target object and move it from position (I mean location) A to position B. To do that you imaginary place the object into several intermediary positions, that is the trajectory the object must follow to reach location B. Lets assume that the task must be achieved by a robot with 3D vision. To achieve the task the chair, which is our target object nr1, must be scanned and recreated in 3d in a virtual environment, other objects in the room must be recreated as well. This way if the target object 1 collides with something, the intersection will be detected in the virtual environment.

I was saying that you need to remove a chair from a pile of things. I dont know much about robotics but I think if a robot scans the pile of objects that will probably turn out as a soup of polygons. Lets assume that based on the shape and color (texture) of items in the pile the robot is able to break down the soup into several distinct objects just as they are in reality. So know the robot has a virtual representation of the pile. Lets say the pile is made of the chair, a box on the chair and a ball on the box. To remove the chair the robot needs to move it upwards first however if he does that the chair will collide with the box. So he needs to remove the box first. The box becomes target object number 2 that needs to be handled. If he moves the box the ball will roll down and could break something. Hence the ball needs to be removed in a diligent fashion as well before the box is taken care of. The ball becomes target object number 3. Now you have a hierarchy of objects that the robot needs to work with. To achieve our goal the robot will start with the last object in the hierarchy that is target object number 3 and work his way towards target object number 1. (the order is remove target object 3 remove target object 2 remove target object 1, removing one object "unlocks" the next one down the hierarchy) Other objects could be laying around as well but if they are not colliding with anything that makes our hierarchy they should be ignored. That`s about it.

[update]

An object in the virtual environment has shape and position. When you want to move the object found in the pile in real life you have move it in the virtual world first, if it runs against over objects in the virtual environment you have to postpone the movement and solve the problem of objects standing in the way. Basically you`re creating a queue of objects that need to be maneuvered. Lets change the initial object setup a bit. There is a chair, a heavy box on the chair and no ball on the box. Instead lets say the floor is covered with other objects. To move the chair you need to move it upwards but if you do that it will run against the box. Now the priority is to move the box rather than the chair, moving the chair becomes a second priority. Since the box is heavy the robot can't hold it for too long which makes moving the box over long distances impossible. So the box needs to be lifted up and placed somewhere nearby. If you do that in the virtual environment the box will run against the objects found on the floor. Because of this moving the box is no longer an immediate priority you have to do something first, make room on the floor to place the box. Moving the box becomes a second priority and moving the char the third priority.

0 likes 2 comments

Comments

JoeJ

Well, i also often think about a very similar toy problem: We have a stack of boxes on a desk, and an empty desk beside. The task is to move all the boxes from one desk to another. The largest problem is to achieve tight packing of the boxes, so we get a new stack which is statically robust and requires as little volume as possible.

That's much more an AI than robotics problem i think. I see no big problems on lifting a box, keeping balanced while moving to another place, and finally putting it down. Although you missed some details. Scanning the object is not enough, this only tells it's shape, but neither it's mass nor it's moment of inertia. Those properties are more important than the shape for robotics. But like humans, we can measure mass quickly when lifting up, and we can approximate inertia over time by sensing angular acceleration the object while we hold it. After we know mass properties, we can add them to the simulated robot model. If it can make the real robot walk, it will also work while carrying some heavy weight after adjusting the mass model. And i would do the exact same also for character simulation in games, which is why i think your robotics topics are relevant to us. I also think we need character simulation more than the real world needs robots. To make better games, we need to get rid of animation. That's my belief. \:D/

Going back to the hard placement problem to pack the boxes tightly, the first question is: How to represent the world for our simulation and AI needs?
Well, for physics simulation your proposed polygons work well. No matter if we get them from scanned real world or collision geometry as used in games.
It also gives us some useful tools. E.g. game physics engines can often trace a convex hull (== our box) along a ray through the scene, and see where it intersects first.
So you could use that to test a upwards trajectory which your AI has predicted to remove a chair for example. Or i could use it to ‘shoot’ my box to the little stack of other boxes i have already put in place.
But i would not know if the place i have found is good, or if nearby places would achieve a better box packing. Nor would i know if the current orientation is any good, or if rotating it 90 degrees would give me a better Tetris score.
Figuring this out using polygons is possible, and eventually fast, but it is very difficult. And with increasing complexity of shapes (teapots vs. boxes … tree vs. teapot) it quickly becomes unpractical.
What's the alternatives? And asking more generally: How do we make a stupid computer understand the shapes forming its environment, so it can interact in a meaningful way with this environment?

That's a good question, no? I mean we've talked about 'mental models' for Terminators already, but we mostly ignored the lack of knowing how such model should be implemented.
I think it's not polygons and neither something like BSP trees. Related data structures are very complex and keeping them up to date for dynamic scenes is notoriously difficult. And worse: they don't help with building a good model for AI purposes, other than accelerating simple spatial lookups.
That's not what AI needs. It's also not how computer vision works. CV works with images (or temporal sequences of images, so a movie).
Our image may not have RGB values, but rather depth, velocity, object ID. We also have mip maps of the image.
To help on my box problem, i can do this now: Depending on the size of my box, i start to sample the mip map where the box represents the size of a pixel. I look for the new box stack in my image using object ID, and then i search for local maximas in depth on the box stack. I'll find multiple potential places where my box might fit in. To increase accuracy i can now consider mip maps with higher resolutions. By sampling multiple pixels, i can fit the depth values to make a model of 3 planes to represent the local environment around the potential placement spot. Similar to how many screenspace graphics tricks like SSAO work, i can build up a local ‘understanding of shapes’ suiting my current problem. How would you do that with polygons? And due to the hierarchical processing, performance costs won't go through the roof at least. If it's worth the brute force cost to generate the image depends on our HW ofc., but that's usually a lot of compute power.

So i'm still on this idea to get better AI by introducing a better model of vision, and the same would in theory also apply to sound, smell, or temperature, or whatever matters to us. Related values and vector fields also make sense to implement them using regular grid data structures. Spatial lookups are cheap, and we can form relations to the nearby space easily, giving us the tools to enable some intelligent behavior.
And ofc. we can make those data structures fully 3D. Pixels become voxels, 2D arrays become 3D arrays, some math becomes harder, but mostly the changes are trivial.
Usually i hate brute force, but in cases i don't know better it's a good starting point.

On the other hand it still feels silly to raster tiny frame buffers for AI agents, although we already have an optimized polygon representation of the world in games.
But i think it's worth to try it out. Too bad AI is so far away, it's not even on my todo list right now. : )

December 15, 2023 09:56 PM
Calin

Usually I hate brute force, but in cases I don’t know better

That’s my approach as well. Usually brute force is the first solution that comes to your mind.

December 16, 2023 11:14 AM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Profile
Author
Advertisement
Advertisement