Can neural networks approximate any function given enough hidden neurons?

I understand neural networks with any number of hidden layers can approximate nonlinear functions, however, can it approximate:

f(x) = x^2

I can't think of how it could. It seems like a very obvious limitation of neural networks that can potentially limit what it can do. For example, because of this limitation, neural networks probably can't properly approximate many functions used in statistics like Exponential Moving Average, or even variance.

Speaking of moving average, can recurrent neural networks properly approximate that? I understand how a feedforward neural network or even a single linear neuron can output a moving average using the sliding window technique, but how would recurrent neural networks do it without X amount of hidden layers (X being the moving average size)?

Also, let us assume we don't know the original function f, which happens to get the average of the last 500 inputs, and then output a 1 if it's higher than 3, and 0 if it's not. But for a second, pretend we don't know that, it's a black box.

How would a recurrent neural network approximate that? We would first need to know how many timesteps it should have, which we don't. Perhaps a LSTM network could, but even then, what if it's not a simple moving average, it's an exponential moving average? I don't think even LSTM can do it.

Even worse still, what if f(x,x1) that we are trying to learn is simply

f(x,x1) = x * x1

That seems very simple and straightforward. Can a neural network learn it? I don't see how.

Am I missing something huge here or are machine learning algorithms extremely limited? Are there other learning techniques besides neural networks that can actually do any of this?

Solution

The key point to understand is compact:

Neural networks (as any other approximation structure like, polynomials, splines, or Radial Basis Functions) can approximate any continuous function only within a compact set.

In other words the theory states that, given:

A continuous function f(x),
A finite range for the input x, [a,b], and
A desired approximation accuracy ε>0,

then there exists a neural network that approximates f(x) with an approximation error less than ε, everywhere within [a,b].

Regarding your example of f(x) = x², yes you can approximate it with a neural network within any finite range: [-1,1], [0, 1000], etc. To visualise this, imagine that you approximate f(x) within [-1,1] with a Step Function. Can you do it on paper? Note that if you make the steps narrow enough you can achieve any desired accuracy. The way neural networks approximate f(x) is not much different than this.

But again, there is no neural network (or any other approximation structure) with a finite number of parameters that can approximate f(x) = x² for all x in [-∞, +∞].