OpenGL How does updating buffers affect speed

I have a buffer I map to vertex attributes to send. Here is the basic functionality of the code:

glBindBuffer(GL_ARRAY_BUFFER, _bufferID);
_buffer = (VertexData*)glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY);

for(Renderable* renderable : renderables){
    const glm::vec3& size = renderable->getSize();
    const glm::vec3& position = renderable->getPosition();
    const glm::vec4& color = renderable->getColor();
    const glm::mat4& modelMatrix = renderable->getModelMatrix();
    glm::vec3 vertexNormal = glm::vec3(0, 1, 0);


    _buffer->position = glm::vec3(modelMatrix * glm::vec4(position.x, position.y, position.z, 1));
    _buffer->color = color;
    _buffer->texCoords = glm::vec2(0, 0);
    _buffer->normal = vertexNormal;
    _buffer++;
}

and then I draw all renderables in one draw call. I am curious as to why touching the _buffer variable at all causes massive slow down in the program. For example, if I call std::cout << _buffer->position.x; every frame, my fps tanks to about 1/4th of what it usually is.

What I want to know is why it does this. The reason I want to know is because I want to be able to give translate objects in the batch when they are moved. Essentially, I want the buffer to always be in the same spot and not change but I can change it without huge sacrifices to performance. I assume this isn't possible but I would like to know why. Here is an example of what I would want to do if this didn't cause massive issues:

if(renderables.at(index)->hasChangedPosition())
    _buffer+=index;
    _buffer->position = renderables.at(index)->getPosition();

I am aware I can send the transforms through the shader uniform but you can't do that for batched objects in one draw call.

Solution

why touching the _buffer variable at all causes massive slow down in the program

...well, you did request a GL_WRITE_ONLY buffer; it's entirely possible that the GL driver set up the memory pages backing the pointer returned by glMapBuffer() with a custom fault handler that actually goes out to the GPU to fetch the requested bytes, which can be...not fast.

Whereas if you only write to the provided addresses the driver/OS doesn't have to do anything until the glUnmapBuffer() call, at which point it can set up a nice, fast DMA transfer to blast the new buffer contents out to GPU memory in one go.