Transformation of vertex data. Mouse picking algorithm

In my program I have a gizmo which moves any objects in the scene. As I already know, is the usual way of storing any transformations is store that transformation in model matrices of this objects and execute any transformation directly in the shader. BUT also in my program I implement a classic ray-picking algorithm which works only with a real transformed data. A ray detect any intersection with real(transformed) vertex position. How is the common way to solve this conflict:

Multiply any transformation immediately on CPU and store transformed data. I think it's a clear way but it's expensive: for example I drag my object on screen in during 100 frames, and each frame I convert the delta of moving to matrices and multiply whole data by it.
Store any transformation in matrices until the mouse picking will starts and then quick multiply vertices by matrix to prepare data for picking. This is very fancy but there is ways to optimize it.

Which is the more performant way. Maybe there is some other method?

Update for Robinson:

I think you misunderstood me. Or I did not fully understand you. I have a box and a sphere and I move it by gizmo (I edit their model matrix) on 1,0,0 and 0,1,0 respectively. His model matrix is now different. HERE I get data that I need for ray-picking - ever objects has own individual place.

Then I transform the entire scene to eye space (view matrix) and then to clip space (projection matrix) and render it. My ray makes return journey from viewport to world space (unproject a view and a projection matrix) and should interacts with the actual scene. My ray transformed rather than scene!

My question was how to interact with the objects which the real place is unknown until it's will render (or transformed)? Or may be I'm not on the right track and I should have done it differently - multiply entire data each step (it's expensive, look at my first question).

Solution

You use ray-picking which technically is "get x,y screen coordinates, transform them to NDC and set the z as anyone in the range [-1,1]; and finally transform them all back to world coordinates".

This is useful when you want to intersect a ray from the point of view (the camera) to "mouse coordinates" AND you want to do all of this intersection calculations on CPU side.
Notice you can do it even when nothing is drawn in the screen, just mouse coordinates are needed; well, plus the viewport and the current transformations, but you know them before any glDrawxxx command.

Next, the question is: what are you going to do with that ray or intersections?

You may wish to modify some property (like color) or position. Right?
How many objects are to be modified? If it's just a bunch then it's OK to do it on CPU modifying the data to send to GPU. But if you have thousands of objects then think of the hardware-accelerated way: keep their coordinates but send the new tranformation matrices and properties to GPU and let it do the hard work.

If you are worried about some objects stay as before but others get modified, remember that you can draw groups of objects that share the matrices and other uniforms with a single glDrawxxx call. If you have different groups, use several glDrawxxx calls with different uniforms, even different shaders.