Advertisement

Simultaneity of ANN

Started by December 15, 2010 10:00 AM
11 comments, last by taby 13 years, 7 months ago
Most neural networks (as far as I'm aware) are a series of layers and the signal propagates in one direction at any given time.

What about a neural network intended to operate similar to an actual brain where all neurons are receiving and sending signals at simultaneous and overlapping times?

How can this be simulated in software without dedicating one CPU to each neuron?

How do we get true (or almost) simultaneity in software? Iterating over a list of neurons or connections is always going to cause firings and their consequences to be registered before other firings that, in reality, would have been registered first.

[Edited by - sooner123 on December 15, 2010 10:17:17 AM]
- for each Connection, propagate the output of the FromNode to a temp buffer in the ToNode

- for each Node, apply the node activation function to the buffer and save it as the node data

Is this simultaneous? Each activation pushes all data a single step through the ANN. No firings are registered out of order.
Advertisement
Timing isn't the problem.

http://en.wikipedia.org/wiki/Vector_clock

Dependency resolution and concurrency are other topics you might want to look into.

Thankfully, your questions are answered in the course of getting a CS degree, which means that you were asking the right ones :)
There are a few different implementations of ANNs that use time as a major component. You might google 'Recurrent neural network'. However, if you are really interested in getting closer to how the brain operates, take a look at Spiking neural networks Please understand, however, that spiking neural networks are not likely to produce anything useful in terms of game or character controllers for quite a while. The research in that field is still fairly fundamental.

As far as how to simulate it, there are only a few operations that you need to perform per neuron, so there is definitely no need to have 1 cpu per neuron. However, computation time is going to increase drastically as the size of the network and number of connections increases.

The simultaneity is kind-of a non-issue. Each neural connection has a propagation delay. Each frame of execution, you can check the input connections for spikes that occurred [DELAY] frames ago per connection. Sum the weights of each connection and process the neuron's internal function to determine if it spikes. If it spikes, it takes at least 1 frame for that spike to propagate to the next neuron.

There are some examples of spiking neural networks being used to control robots, and a couple of training methods have been defined. But I have not seen any application yet for which spiking neural networks is the best algorithm. That may have to wait until we figure out how the brain works :-). So it's just a matter of time.
Quote: Original post by sooner123
Most neural networks (as far as I'm aware) are a series of layers and the signal propagates in one direction at any given time.


Artifical neural networks are often taught from the vantage point of the back propagation learning algorithm.

Most backprop implementations require a chain of layers, where each layer only feeds directly in to the next adjacent layer. This is only to facilitate the learning algorithm though, and has nothing to do with the perceptron.

With regards to firing all at once:
For all intents and purposes each collection of nodes in a 'layer' of a feed-forward network execute in parallel. Each node in a layer simply acts as an accumulator until such time as there are no more values to accumulate. The order in which values are accumulated is irrelvant. [edited to be more clear]

Regarding connections:
Go ahead and connect nodes arbitrarily to any other node. For a feed-forward network this is fine provided you don't ever have a node feeding a node 'behind' it in the node list.

If you do plan on forgoing the backprop route:
You don't need to use a sigmoid (or any other nonlinear function) for your activation function. The nonlinear activation function is only there to facilitate backprop. You can prove this yourself by hand-coding a multilayer perceptron for XOR that uses a simple threshold activation.

If you do plan on forgoing the backprop route:
You don't need to use a sigmoid (or any other nonlinear function) for your activation function. The nonlinear activation function is only there to facilitate backprop. You can prove this yourself by hand-coding a multilayer perceptron for XOR that uses a simple threshold activation.


This is not quite exactly consistent with the history of AI, according to Practical Neural Network Recipes in C++ by T. Masters.

Rosenblatt's 1957 perceptron model only solves for linearly seperable problems (e.g., can't solve the XOR problem). Minksy, et al.'s 1969 Perceptrons book crushed the AI research community's morale and funding to the brink of death.

Rumelhart, et al.'s 1986 non-perceptron model is not limited to solving linearly seperable problems (e.g., can solve the XOR problem).

This book is fairly old, so perhaps there is a new perceptron model that I am not aware of... edit: Ah yes, it seems that various tricks and shell games may be used to massage nonlinear input data to make them work better with perceptron models, but this is not the same thing as directly accepting raw nonlinear input data like a feedforward backpropagation neural network does. It seems that it is common for the massage process to turn the nonlinear data into practically linear data by adding in a motherload of dimensions. It seems that the perceptron is still an inherently linear beast. If you have code that shows that the Masters book is wrong, then I would love to see it because this subject interests me a lot.
Advertisement

'willh' said:

If you do plan on forgoing the backprop route:
You don't need to use a sigmoid (or any other nonlinear function) for your activation function.


This is not quite exactly consistent with the history of AI, according to Practical Neural Network Recipes in C++ by T. Masters.

Rosenblatt's 1957 perceptron model only solves for linearly seperable problems (e.g., can't solve the XOR problem). Minksy, et al.'s 1969 Perceptrons book crushed the AI research community's morale and funding to the brink of death.

Rumelhart, et al.'s 1986 non-perceptron model is not limited to solving linearly seperable problems (e.g., can solve the XOR problem).

This book is fairly old, so perhaps there is a new perceptron model that I am not aware of… edit: Ah yes, it seems that various tricks and shell games may be used to massage nonlinear input data to make them work better with perceptron models, but this is not the same thing as directly accepting raw nonlinear input data like a feedforward backpropagation neural network does. It seems that it is common for the massage process to turn the nonlinear data into practically linear data by adding in a motherload of dimensions. It seems that the perceptron is still an inherently linear beast. If you have code that shows that the Masters book is wrong, then I would love to see it because this subject interests me a lot.


The very first thing you should do is to try it yourself using a pen and paper. It will only take about 3 minutes.

- No tricks or shell games needed.
- No projecting in to a higher demensional space
- No need to massage the input

Using two layers of perceptrons (multilayer perceptron) one is able to solve XOR using a simple threshold activation function. If you wuld rather just see, then, via Google (XOR Neural Network):
http://library.thinkquest.org/29483/neural_index.shtml

The perceptron is just a type weak-classifier. Weak classifiers, when layered (or stacked, or chained, etc..) can of course separate non-linear problems.

Backpropagation is a gradient descent learning algorithm (i.e. hill climbing). For the sake of the learning algorithm the model of the multilayer perceptron needs a gradient response to a given feature. Threshold activations don't give much of a gradient, whereas as Sigmoids do. In practice you could use just about any polynomial function.

If we were talking about support vector machines, then yes, you would be right about massaging the data. That said, you can add a gradient descent learning algorithm to the front-end of something like SMO to find good parameters for the 'massuse'. I'm not aware of this being reported in the literature but I am aware of it being done in practice.

If you're interested in artifical neural networks and want to see something neat (hehehe) check out:
http://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_topologies

Will

Using two layers of perceptrons (multilayer perceptron) one is able to solve XOR using a simple threshold activation function. If you wuld rather just see, then, via Google (XOR Neural Network):
http://library.think...ral_index.shtml


Edited...

I see now that multiple unit and single unit perceptrons are different things, though this Recipes book also talks about the failure of multilayer perceptrons.

So is a standard feedforward backpropagation network a multilayer perceptron? If not, what's the difference?

Thank you for the information on the other topics. That is very fascinating material.

The differentiability of the activation function is important too, which is just my way of pretending to be fancy. :)

I see now that multiple unit and single unit perceptrons are different things, though this Recipes book also talks about the failure of multilayer perceptrons.

So is a standard feedforward backpropagation network a multilayer perceptron? If not, what's the difference?



A feedforward network is a type of multilayer perceptron model. Feedforward means that data only flows forward from one layer to the next. Other types could allow perceptrons in one layer to feed back in to themselfs, or into perceptrons from a previous layer.

Backpropagation is a learning algorithm. It isn't a network type. You can apply multiple learning algorithms against the same network-- you could use something like NEAT on an ANN first, and then after it's performing well use Backpropagation to 'fine tune' the weights.

Pretty much all of these statistical/comp-sci type AI solutions can be considered a function approximator. In the case of XOR, the function is XOR. It's used as a staple experiment only because the function is non-linear and very simple. Knowing that we can express XOR using a multilayer perceptron (in other words, given the tools at hand we know that a solution exists), we can then test different learning algorithms to see how well they can find said solution. As example, you can build the optimal XOR ANN by hand (see the link I posted earlier) and compare it to the one your learning algorithm finds.

There is nothing inherently wrong with the perceptron. In theory you could build Windows XP using nothing but perceptrons. It wouldn't be terribly efficient but that's not really the problem. The problem is usually with the learning algorithm, and how well it is able to approximate a given function. There is the other problem of the human operator too: Usually you build a function approximator when you don't know what the original function should look like. :)

With backpropagation there are a lot of unknown parameters. The biggest is the architecture of the network on which backprop is applied. Then there is the learning rate, the epoch, and the starting weights used.. Evolving an ANN addresses some of these problems, but the tradeoff is that they can take much longer to find an answer (if any exists).

Will



OK, well, I'm still confused. It seems that multilayer perceptrons were known in Minksy's time, almost two decades before Rumelhart's time. There must be a critical difference between the two.

This topic is closed to new replies.

Advertisement