Very fast object allocator for small object of same size

I just have to write some code that has to have best performance.

Requirements:

I need a very fast object allocator for quick creation of object. My object as only 3 doubles in it. Allocation and deallocation will occurs one object at a time only.

I made a lots of research and come up with:

std::vector<MyClass, boost::fast_pool_allocator<MyClass>>

I wonder (in 2014-07):

Does stl have something equivalent to boost::boost:fast_pool_allocator ?
Is there a better solution to what I have found ?

There is additional information to answer some comments:

The code will be used to optimize my algorithm for: Code Project article on Convex Hull
I need to convert C# code to C or C++ to improve performance. I should compete against another algorithm written in pure "C". I just discover that my comparison chart in my article have errors because I tested against code compiled in C for x86-Debug. In x64-release the "C" code is a lot faster (a factor of 4 to 5 times faster than in x86-debug).
According to This Boost documentation and this Answer at StackOverFlow, boost:fast_pool_allocator seems to be the best allocator to use for small memory chunk of same size query one by one. But I would like to make sure nothing else exists that is either more standard (part of stl) or faster.
My code will be developed on Visual Studio 2013 and target any windows platform (no phone or tablet).
My intend is not to have fast code, it is to have the fastest code. I prefer not having too much twisted code if possible and also look for code that is maintainable (at least a minimum).
If possible, I also would like to know the impact of using std:vector vs array (ie: []).
For more info, you could see Wikipedia - Object pool pattern

Solution

The closest thing to what I was looking for was Paulo Zemek Code Project article: O(1) Object Pool in C++.

But I finally did allocate/reserve memory of size= the maximum size possible * my object size. Because I was not using any objects that required to live longer than my algorithm loop, I cheated by saying that location in reserved memory space was object. After my algorithm loop, I flushed the reserved memory space. It appears to me to be the fastest. Very not elegant but extremely fast and only require one allocation and one deallocation.

I was not totally satisfy with the answer, that's why I answered myself. I also added a comments to the questions about every comments and added this answer to make thinks clear. I know that my decision/implementation was not totally in accordance with the question but I think that it should have conducted to something similar.