How can I improve my file writing method to reduce Wavefront Object file size?

I am trying to write out the voxelization of a model to a Wavefront Object File.

My method is simple, and runs in reasonable time. The problem is - it produces OBJ files that are ludicrous in size. I tried to load a 1 GB file into 3D Viewer on a very respectable machine with an SSD but in some cases the delay was several seconds when trying to move the camera, in others it refused to do anything at all and effectively softlocked.

What I've done so far:

I do not write out any faces which are internal to the model - that is, faces between two voxels which are both going to be written to the file. There's no point, as no-one can see them.
Because OBJ does not have a widely-supported binary format (as far as I know), I found that I could save some space by trimming trailing zeros from vertex positions in the file.

The obvious space-save I don't know how to do:

Not writing out duplicate vertices. In total, there are around 8x more vertices in the file than there should be. However, fixing this is extremely tricky because objects in Wavefront Object files do not use per-object, but global vertices. By writing out all 8 vertices each time, I always know which 8 vertices make up the next voxel. If I do not write out all 8, how do I keep track of which place in the global list I can find those 8 (if at all).

The harder, but potentially useful large space-save:

If I could work more abstractly, there may be a way to combine voxels into fewer objects, or combine faces that lie along the same plane into larger faces. IE, if two voxels both have their front face active, turn that into one larger rectangle twice as big.

Because it's required, here's some code that roughly shows what is happening. This isn't the code that's actually in use. I can't post that, and it relies on many user-defined types and has lots of code to handle edge cases or extra functionality so would be messy and length to put up here anyways.

The only thing that's important to the question is my method - going voxel-by-voxel, writing out all 8 vertices, and then writing out whichever of the 6 sides is not neighboring an active voxel. You'll just kind of have to trust me that it works, though it does produce large files.

My question is what method or approach I can use to reduce the size further. How can I, for example, not write out any duplicate vertices?

Assumptions:

Point is just an array of size 3, with getters like .x()
Vector3D is a 3D wrapper around std::vector, with a .at(x,y,z) method
Which voxels are active is arbitrary and does not follow a pattern, but is known before writeObj is called. Fetching if a voxel is active at any position is possible and fast.

//Left, right, bottom, top, front, rear
static const std::vector<std::vector<uint8_t>> quads = {
    {3, 0, 4, 7}, {1, 2, 6, 5}, {3, 2, 1, 0},
    {4, 5, 6, 7}, {0, 1, 5, 4}, {2, 3, 7, 6}};

void writeOBJ(
    std::string folder,
    const std::string& filename,
    const Vector3D<Voxel>& voxels,
    const Point<unsigned> gridDim,
    const Point<unsigned>& voxelCenterMinpoint,
    const float voxelWidth)
{
  unsigned numTris = 0;
  std::ofstream filestream;
  std::string filepath;
  std::string extension;
  ulong numVerticesWritten = 0;

  // Make sure the folder ends with a '/'
  if (folder.back() != '/')
    {
      folder.append("/");
    }

  filepath = folder + filename + ".obj";

  filestream.open(filepath, std::ios::out);

  // Remove the voxelization file if it already exists
  std::remove(filepath.c_str());

  Point<unsigned> voxelPos;

  for (voxelPos[0] = 0; voxelPos[0] < gridDim.x(); voxelPos[0]++)
    {
      for (voxelPos[1] = 0; voxelPos[1] < gridDim.y(); voxelPos[1]++)
        {
          for (voxelPos[2] = 0; voxelPos[2] < gridDim.z(); voxelPos[2]++)
            {
              if (voxels.at(voxelPos)) 
                {
                  writeVoxelToOBJ(
                      filestream, voxels, voxelPos, voxelCenterMinpoint, voxelWidth,
                      numVerticesWritten);
                }
            }
        }
    }

  filestream.close();
}

void writeVoxelToOBJ(
    std::ofstream& filestream,
    const Vector3D<Voxel>& voxels,
    const Point<unsigned>& voxelPos,
    const Point<unsigned>& voxelCenterMinpoint,
    const float voxelWidth,
    ulong& numVerticesWritten)
{
  std::vector<bool> neighborDrawable(6);
  std::vector<Vecutils::Point<float>> corners(8);
  unsigned numNeighborsDrawable = 0;

  // Determine which neighbors are active and what the 8 corners of the
  // voxel are
  writeVoxelAux(
      voxelPos, voxelCenterMinpoint, voxelWidth, neighborDrawable,
      numNeighborsDrawable, corners);

  // Normally, if all neighbors are active, there is no reason to write out this
  // voxel. (All its faces are internal) If inverted, the opposite is true.
  if (numNeighborsDrawable == 6)
    {
      return;
    }

  // Write out the vertices
  for (const Vecutils::Point<float>& corner : corners)
    {
      std::string x = std::to_string(corner.x());
      std::string y = std::to_string(corner.y());
      std::string z = std::to_string(corner.z());

      // Strip trailing zeros, they serve no prupose and bloat filesize
      x.erase(x.find_last_not_of('0') + 1, std::string::npos);
      y.erase(y.find_last_not_of('0') + 1, std::string::npos);
      z.erase(z.find_last_not_of('0') + 1, std::string::npos);

      filestream << "v " << x << " " << y << " " << z << "\n";
    }

  numVerticesWritten += 8;

  // The 6 sides of the voxel
  for (uint8_t i = 0; i < 6; i++)
    {
      // We only write them out if the neighbor in that direction
      // is inactive
      if (!neighborDrawable[i])
        {
          // The indices of the quad making up this face
          const std::vector<uint8_t>& quad0 = quads[i];

          ulong q0p0 = numVerticesWritten - 8 + quad0[0] + 1;
          ulong q0p1 = numVerticesWritten - 8 + quad0[1] + 1;
          ulong q0p2 = numVerticesWritten - 8 + quad0[2] + 1;
          ulong q0p3 = numVerticesWritten - 8 + quad0[3] + 1;

          // Wavefront object files are 1-indexed with regards to vertices
          filestream << "f " << std::to_string(q0p0) << " "
                     << std::to_string(q0p1) << " " << std::to_string(q0p2)
                     << " " << std::to_string(q0p3) << "\n";
        }
    }
}

void writeVoxelAux(
    const Point<unsigned>& voxelPos,
    const Point<unsigned>& voxelCenterMinpoint,
    const float voxelWidth,
    std::vector<bool>& neighborsDrawable,
    unsigned& numNeighborsDrawable,
    std::vector<Point<float>>& corners)
{
  // Which of the 6 immediate neighbors of the voxel are active?
  for (ulong i = 0; i < 6; i++)
    {
      neighborsDrawable[i] = isNeighborDrawable(voxelPos.cast<int>() + off[i]);

      numNeighborsDrawable += neighborsDrawable[i];
    }

  // Coordinates of the center of the voxel
  Vecutils::Point<float> center =
      voxelCenterMinpoint + (voxelPos.cast<float>() * voxelWidth);

  // From this center, we can get the 8 corners of the triangle
  for (ushort i = 0; i < 8; i++)
    {
      corners[i] = center + (crnoff[i] * (voxelWidth / 2));
    }
}

Addendum:

While I ultimately ended up doing something like what @Tau suggested, there was one key difference - the comparison operator.

For points represented by 3 floats, < and == is not sufficient. Even using tolerances on both, it does not work consistently and had discrepancies between my debug and release mode.

I have a new method I will post here when I can, though even it is not 100% foolproof.

Solution

If you define a custom comparator like this:

struct PointCompare
{
  bool operator() (const Point<float>& lhs, const Point<float>& rhs) const
  {
    if (lhs.x() < rhs.x()) // x position is most significant (arbitrary)
      return true;
    else if (lhs.x() == rhs.x()) {
      if (lhs.y() < rhs.y())
        return true;
      else if (lhs.y() == lhs.y())
        return lhs.z() < rhs.z();
    }
  }
};

you could then make a map from Points to their index in a vector, and whenever you use a vertex in a face, check if it already exists:

std::vector<Point> vertices;
std::map<Point, unsigned, PointCompare> indices;

unsigned getVertexIndex(Point<float>& p) {
  auto it = indices.find(p);
  if (it != indices.end()) // known vertex
    return it->second;
  else { // new vertex, store in list
    unsigned pos = vertices.size();
    vertices.push_back(p);
    indices[p] = pos;
    return pos;
  }
}

Compute all faces using this, then write the vertices to file, then the faces.

Combining voxel faces optimally is indeed somewhat more complicated than this, but if you want to have a go at it, check this out.

Alternatively, if you are handling only a few meshes overall, you may want to save yourself the hassle optimizing your code, and use the free MeshLab, which can remove duplicate vertices, merge faces and export to a variety of (more efficient) formats with just a few clicks.

Btw: Storing your voxels in a list is only efficient if they are really sparse; using a bool[][][] instead will be more efficient in most cases and really simplyfy your algorithms (e. g. for finding neighbors).