Background
A while ago I saw this video by SethBling about 3D rigidbody physics in Minecraft using the Separating Axis Theorem for collision detection, and the idea really resonated with me. After seeing this video and reading up on the concept, I used the knowledge to add proper 2D polygon colliders to a Terraria clone I had been working on at the time in order to learn C# outside the context of the Unity Game Engine. By now I've used this method in several projects, including an implementation of Asteroids for college. But, all of the projects I've done using SAT so far have been 2D, and being the massive dweeb that I am, I've always wanted to try it in 3D. What better time to try some massive new project than a month before college starts back up again?
Getting literally anything to work
Before getting into how I've implemented things, I should probably
explain what the SAT actually is. In all honesty it's pretty simple as
long as you understand a few core concepts, namely the dot product of
vectors. The dot product can be thought of as an operation that
rotates 2 vectors such that one of them aligns with the x axis, and
returns the x component of the other vector multiplied by the first
vector's length. For our purposes today, we can ignore that final
multiplication because we can always assume at least one of our
vectors is normalized, and has a length of 1. The effect this has is
that doing the dot product casts a "shadow" of
one vector onto the other, and gives us where that "shadow" lands on
the x axis. The other main concept to understand is that for any 2
convex shapes that are not overlapping, there exists at least one axis
onto which they can be projected and their shadows don't overlap. That
one is pretty obvious if you think about it, but the important part is
that we can very easily find that axis—in fact this is pretty much the
entirety of the Separating Axis Theorem. It turns out that in 2D, you
only need to check the axes perpendicular to each side of both objects
and you're guaranteed to find the separating axis if it exists.
Now that we have our foundation, it's pretty easy to get started
implementing the system! One important consideration is that the SAT
only works for convex shapes, so any complex shapes
with concavities will need multiple colliders. The most simple way of
dealing with this is just to have each object store a
std::vector<T> of colliders, but I think that there's
aesthetically a lot to be desired with that option. Instead, I opted
to have a base Collider class, and have
MeshCollider and CompoundCollider derive
from it. That way, when an object only needs a single collider, it
only needs to store that one collider, but more complex objects can
have a compound of several in the same slot. This might not be the
best approach, but I like it because of the aforementioned aesthetics,
and it gives me the flexibility to add specialized collider types down
the line like a sphere. I don't know if that will be worth the effort,
but it's nice to have the option. The main type of collider I'm going
to be talking about going forward is going to be the mesh collider,
which has a vector of vertices, and a vector of normals. Now we just
need to make a function that takes in 2 colliders, splits up any
compound colliders, and then checks for an overlap.
Now, for code I wrote in a super sleep-deprived state in a discord
call with while my equally sleep-deprived friends played games and
built Lego, I'd say this function turned out pretty good. It has a few
problems I'll go over later, but for now it worked well enough that I
was tricked into thinking it was done based on the console log testing
I was doing. The function takes in a transformation matrix for each
collider to put them into world-space, and the loop breaks as soon as
a separating axis is found to avoid redundant work. This may change
later, as it might be necessary to complete the loop to find the exact
location in 3D space the collision occurred. I also made the function
return an
std::optional<T> of a hit object to pass info about the
collision back out for resolution later, and created a simple struct
of 2 floats representing the range of values a collider covers when
projected onto normal. With this done, I can finally start making some
objects to attach these colliders to and draw to the screen!
Getting something to render
The first step of getting something drawing to the screen, is creating
a class to contain all the data related to rendering and simulating a
given object. This class doesn't need a whole lot in it at the moment,
just some transformation matrices, a mesh material and shader for
rendering, and a collider. In terms of methods, all it's got is some
setters and getters for the data, and functions for logic updates and
rendering(Update() and Draw() respectively).
I also took the time to set up a World class. There's not
much in there at the moment, but it lets me bundle up all my global
data and the internal logic in a much more organized place. From
there, all I needed to do was loop through all the objects in the
world and call their update and draw meshes, while also calling the
collision check method against every other object. It's not optimized
or efficient, but at the moment I'm looping through the objects 3
separate times(plus one nested loop) for simplicity.
At this point it should have been time to move on to implementing
collision resolution, but you may remember I mentioned that I wrote a
good chunk of the code under what might be called "sub-optimal
conditions". That's right, my sleep deprived code I
thought was working is now coming back to bite me in
the behind. I might not have noticed a lot of these issues if I hadn't
swapped one of the cubes I was testing with for another shape as a
sanity check, because it just so happened most of my mistakes involved
accidentally discarding the results of transformations in several fun
flavours. Chief among those was calling raylib's vector and matrix
transformation functions and forgetting to actually assign the
returned result to anything. I did this in a few different places,
alongside some places where I tried assigning to the temporary
variable in range-based for loops over vectors of points.
It was quite a lot of fixing to be done, and in the process I also
discovered that the web builds using
Emscripten were
completely borked, which led to even more fixing. Unfortunately for
me, that wasn't even the end of it.
You might have noticed all the way back in my explanation of the
Separating Axis Theorem, I specifically talked about how you can find
the axes that need to be tested in 2D, but didn't say anything about
how that might be different in 3D. That's because I had completely
forgotten that it was different. I don't know that I can blame this
mistake on a lack of sleep, but I will use the excuse that it had been
at least a year and a half since I had seen the video explaining the
concept in 3D. The thing is, using only the normals of the objects
works in most cases where 2 cubes are being tested, even if they are
rotated. The problem happens when the cubes are rotated in odd but
common ways relative to one another, and the overlap detection returns
a false positive. To get rid of this issue(and presumably similar
issues with more complex shapes), you need to loop through every edge
on one shape, and take the cross product of it with every edge on the
other shape and check the resulting direction vectors on top of the
face normals. Since the cross product is an operation that returns a
new vector perpendicular to both inputs, and the false positives
happen when 2 edges are oriented weirdly relative to one another, it
makes sense that checking these axes would eliminate the issue. With
the problem identified, it wasn't that big a deal to actually solve
it. The main change that needed to be made was that colliders need to
store their edges, which I decided to solve by storing a vector of
index pairs that refer to the vector of vertices. A fun side effect of
solving this issue was that I was able to render colliders using
raylib's DrawLine3D() function, which is a lot nicer for
visualizing collisions than my previous solution of changing the
colour of the whole object. And with that lengthy detour out of the
way, it's finally time to move on to using the information from the
SAT to resolve the collisions!
Everything thus far was written after the fact, because I hadn't decided to document the making of this project in this much detail yet, but from here on each segment will be written as I go through them.
Course correcting
At this point I figured it would be useful to have some ability to inspect and modify objects at runtime. Obviously I don't need any sort of reflection system for a small personal project like this, but being able to see and modify the transform data of a physics object without recompiling the project would be very nice. So, I decided that before moving on to collision resolution, I would implement raycasting to select objects in the scene, and bring in Dear ImGui to display relevant information. I had never used ImGui before, but had heard about it in passing and thought it a great fit for my purposes. After some struggling with outdated documentation and git hashes, I was able to get it up and running with the help of a nice compatibility library called rlImGui. Now that I had ImGui ready to use, it was time for raycasting, and this is where the title of this section comes in to play. You see, up to this point I had kind been going off vibes and what little knowledge I already had about 3D physics in games. I had some idea of how it might work, and thought I could work it out as long as I had my Separating Axis Theorem implementation in place to support it all. I found out just how wrong I was the moment I began to look into how raycasting works—something I had no idea about whatsoever.
After some googling and digging through forum posts, I was able to find this document going over raycasting, and this document on 3D physics. In the second one—a document from valve—the author explains very early on that they use the Half-Edge data structure AKA the Doubly Connected Edge List, and it became very apparent why as soon as I started to read the first document. The Half-Edge data structure makes it much easier to represent and iterate over adjacencies like "all the edges that make up a face" or "all connected faces". I had already started to move in that direction when I realized previously that I needed to keep track of edges and what vertices make them up, but I didn't quite think about how else I might need to care about the relations between elements of the polyhedron. So, I decided to read up on the data structure and implement my own version. At first I made something very 'by the books' with loosely connected Edge Face and Vertex objects that are related only by pointers. This wasn't very convenient for my purposes, and I didn't like how nothing was guaranteed to be next to anything else in memory, so I instead changed the structure to exist inside a 'container' class in a series of vectors. Then, each object can store a single pointer for each vector it needs to access, and the index of the object it's related to. As an example, the all important Edge class looks something like this:
I'll save you the exhaustive detail, but trying to integrate this
system was a massive headache that took several days of work before I
was even able to make sure I'd structured it all correctly. I spent
many hours getting segfaults and seemingly inexplicable errors that
ultimately boiled down to either not understanding how the
std::map<T> structure stores it's members, or looking
in the complete wrong part of the code. Perhaps the funniest example
of the latter was towards the end of the process, when I spent nearly
8 hours debugging things trying to find one problem and solving about
a dozen others along the way, only to realize in the end that I was
simply providing the vertices in the wrong order so none of the half
edges where being paired with their opposite, since they were both
pointing in the same direction. After finally solving that, the
raycasting itself was almost child's play by comparison and only took
me a day or 2 of work to get fully functional.
From there, it was a simple matter to do a raycast from the mouse and store a pointer to whatever object(if any) was hit, and display it's position in an ImGui window. That info gets displayed as a series of input fields, and the temporary Vector3 ImGui uses can be fed back to the object at the end of the tick to change it's position at runtime!