tl;dr: I have finally managed to build and train a very simple neural network, ‘GamerNet‘, and in this post I’ll show you what he’s built of, and what I’m trying to use him for. For a little refresher on what that is and why I’m doing it, please refer back to Part 1.
So, what have I built exactly?
Well, I like to call him GamerNet.
III. Construction of GamerNet.
As I pointed out earlier, the math behind building a neural network is odd; as in, beyond my immediate abilities odd. Fortunately though, one of the convenient things about the internet is that what you can’t do, you can Google, and, if you’re lucky, you can run into someone who’s already tried to deal with it. In my case, I got extremely lucky, because I didn’t run into just one problem solver, I ran into a whole company full of them.
Matlab, by Mathworks, is not software I’ve had much experience with (in fact, before a few weeks ago, I had literally no experience with it whatsoever) but now I only wish I had discovered it sooner. It is basically a dataset visualization and analysis tool, and has a variety of plugins for different mathematical applications, which they like to call toolboxes. They also happen to have something called the Neural Network Toolbox, which to a n00b like me, was an absolute goldmine of a find, because it allowed me to create neural networks using a clean and simple GUI approach; it is literally more than half the reason this post is possible at all.
So, what is GamerNet? Well to be honest, it’s just a name; I like video games, and they’ve been on my mind a lot recently, so I figured it would be a relatively interesting example to illustrate what I’m trying to to do here. Don’t worry if you don’t know anything about games or gamers, it’s just to provide some context for all the numbers I’m about to throw at you. Speaking of which:
This is a random survey response generator I made in Excel (which by the way, I’ve used to frame most of my results; I figured it’d be more familiar than Matlab). I’ve provided the download link so you can see what’s going on here, but it’s essentially just a data generator glued together with a bunch of ‘rand()’s and ‘if()’s. But, I’ve gone a little bit further; I’ve seeded it with a few patterns:
'OCCUPATION' = CHOOSE(RANDBETWEEN(1,2),'STUDENT','PROFESSOR') 'AGE' = IF('PROFESSION'="Student",RANDBETWEEN(18,22),RANDBETWEEN(30,45)) 'FACEBOOK' = IF('AGE'<35,"YES","NO") 'TWITTER' = IF('AGE'<30,"YES","NO") 'GOOGLE+' = IF(RANDBETWEEN(0,1)=1,"YES","NO") 'LINKEDIN' =IF('AGE'>21,"YES","NO") 'GAMER' = IF('AGE'<25,"Gamer","Not Gamer")
Which basically translates to:
- Randomly decide if they’re a student or professor
- f they’re a student, give them a random age between 18 and 22 or 30 to 45 if they’re a professor.
- If they’re younger than 35, give ‘em a Facebook profile
- If they’re younger than 30, give ‘em a Twitter
- if they’re older than 21, give ‘em a LinkedIn
- Randomly give them a Google+ Profile, and most importantly
- If they’re below 25, make them a Gamer, and if not, then not.
Normally I don’t try to alter my datasets with existing patterns, but since the purpose of this exercise is to see whether or not a network can learn, I had to give it patterns I could actively look for, in order to verify its success. I picked one randomly generated set of data to treat as my Final set for this experiment:
I then began began converting this data into a set of binary inputs. As we discussed earlier, neural networks (or at least the ones I’m playing with) aren’t very smart, so dealing with continuous variables (a range of numbers) is complex, so we have to convert it into something a computer can understand, and few things are simpler to a computer than binary.
Before I lose the less brave of you, don’t be afraid of the 1s and 0s; all you need to remember is that a ’1′ means ‘Yes’, and ’0′ means ‘No’, so when you see a 0 in a column, all that means is that that that variable is ‘off’ (and ‘on’ if it’s a 1.) For example, you’ll note in the following table that I had to split ages into different bins. This is because the network can’t deal numbers like 18 and 27 and everything in between, but, if I turn age into ‘boxes’ of ages, then the machine can make sense of it; a 1000 person is a teenager, a 0100 person is in his 20s, a 0010 person is in his 30s, and a 0001 person in his 40s, and since no one can be more than one age at once, there will only be one ’1′ in the first four columns:
Don’t worry if you don’t get that quite yet, just try to remember that it just means ‘off’ and ‘on’. Next, I had to transpose (re-orient) the data, because Matlab kept being fiddly about vectors and indices:
Now, those of you who aren’t scrolling too fast have probably noticed the last column, ‘Gamer?’, is missing. This is because the neural network requires two things, an input set that handles the input variables and an output set, or target set layer, that handles the output variables associated with each slice of data. In order to teach a neural network, you have to tell it what the right answers are, so it can associate the right set of input variables for the right output variables.
That is the trick that underlies neural networks; you have to some data you already know is right for the computer to practice with.
Here is what the computer’s answer key looks like:
Next, we import our data into Matlab, and run the appropriate neural network software tools. I can’t really get into much of the detail of that here because its rather technical (and I don’t like screenshotting all the mistakes and errors I make while I tinker around) but suffice it to say, through a series of steps, we can train this network to associate a matrix of inputs to a matrix of outputs. What’s more, Matlab also gives us to the tools to determine how well our neural network is learning, and here I will illustrate two such ‘tests’ of intelligence: Confusion and Performance:
Performance is simply the graph of Mean Squared Error over time. It is a term I (and probably a few of you) have encountered already in COMM 291 with Jonathon Berkowitz; MSE is just the difference between the actual value and the predicted value squareed, and it tells you how ‘off’ your prediction is from the right answer (its squared so we don’t have to bother about negatives distances, and just deal with absolutes). Epochs are intervals of time, you can think of them as ’rounds’ of training; Matlab uses an algorithm I do not yet understand to find the best time to stop training the network in order to minimize the error, without forcing it to ‘overfit’ the data, and as you can tell, it decided that the best time was apparently around the 20th run.
Confusion, on the other hand, is an amusingly unique graph just to neural networks, and is a test of how well the network did at categorizing the data; Output Class tells you what the network guessed a sample of data was, and the Target Class tells you what that sample actually was. As you can see from the red and green squares, the network got it right a 100% of the time, with no ‘wrong’ classifications, but this is probably due more to our dataset being tiny than our network being a genius; erring is human and expected, even amongst simulated humans.
In summary, here is the diagram of the neural network we have just built out of an input layer of 9 binary input variables (10s, 20s, 30s, 40s, Occupation, Twitter, Facebook, Google+, LinkedIn), an output layer with 2 binary output variables (Gamer, or Not Gamer), and through 10 sigmoidal hidden layers who’s weights and biases have been trained:
So, now we have a functional neural network, but what can we do with it?
III. Application of GamerNet.
This step frustrated me for a while. Thanks to Matlab I’d managed to build and train network after network, but that was it; none of the examples went any further. No one seemed to be able to tell me how I was supposed to actually get around to asking this thing what I wanted to ask. Well, after a series of trials and errors, I eventually managed to overcome this language barrier by discovering one simple line of text:
y = sim(net,x)
This Matlab command lets you simulate the neural network you just created on an outside set of data. To be honest, I still don’t even know if this is the right way of using it, but it’s the only thing I found that seemed to allow me to ask GamerNet the questions I wanted to ask; so long as I framed them properly in data. So, what did I want to ask, exactly?
Well, this summer I dealt with Hessian Matrices for the first time, and while I didn’t do so great in that class (turns out its hard to focus on learning math while you’re goofing off with your friends all summer), I learnt a lot about the underlying mathematics of economics, and was fascinated by the idea of these little ‘correlation matrices’. Unfortunately, while the math I learnt was useful for general cases, but became very obtuse when confronted with real data and functions; building an actual ‘correlation matrix’ using these tools would be extremely tedious (just partial derivatives on partial derivatives). But, what if I could use a nonlinear system that could deal with data in order to simulate this effect?
What if I could use the neural network to create my very own little correlation matrix, that tells me the likelihood of something being A or B relative to a specific variable?
Here is the ‘test’ set I constructed for my neural network to think about:
As you can probably tell, I’ve designed such that each sample has only one variable turned on; i.e. I’m asking the neural network whether or not someone is a Gamer, based on knowing just one thing about them. I don’t expect him to be perfectly accurate, I just want him to take his best guess, based on what I’ve taught him. Well, this is what he told me:
What does this mean? Well, maybe this well make it a little clearer:
Essentially, if someone is in their 20s or teens, it is highly likely for them to be a gamer, while people in their 30s or higher are likely to not be. Furthermore, if they’re a student, they’re apparently very likely to be gamer, and the same goes for having a Twitter or Facebook profile. However, apparently having a Google+ profile doesn’t really matter either way, and LinkedIn seems to be an indicator of Non-Gamer tendencies, probably since every adult has one.
Another important thing to note is that the results do not sum up to 1; it is the simultaneous guess of whether a person IS or IS NOT a gamer, not just (1-P(gamer)); they are two separate guesses. Also, these are not probabilities, they are merely levels of activation, they just look like probabilities because ‘neuron activity’ is measured between 1 and 0.
So, you’ll probably notice that these guesses sounds a lot like the patterns I seeded into this dataset from earlier. Apparently the system has learnt them fairly well, and has even apparently noticed a few overlaps I didn’t put in there. Let’s try asking its opinion on a few more “complex” cases, specifically:
- A socially active 20 something college student.
- A 40 year old Professor who doesn’t care for the Internet.
- And an introverted highschool student
Here are the results:
This is all just a bit of an exaggeration, but as you can see, the network thinks it is highly likely that the internet-savvy college student plays a few games, and also thinks that the professor probably doesn’t play much DoTA (or even knows what that is). The teenager is slightly harder to categorize, and I’m still trying to figure out exactly why it gave me these numbers (probably because of the integrating effects and the small sample size), but the system still seems to think he is more likely to be into video games than he isn’t, which isn’t that bad of a guess.
Thus, with a little ‘hacking’, I’ve managed to create a very tiny, very stupid little brain that can not only learn, but answer a couple interesting questions. I have yet to see how it behaves when forced to interact with living data, but this is just one example of one application, and there are a whole variety of neural nets out there that can do a whole bunch of different things.
And now that I’ve figured out a few of the basics, I’m eager to see what else is next.
IV. Pessimism and Conclusions.
You can probably now see why this has me so excited; the possibilities of such a system are endless, and it is already being used for a variety of applications from handwriting recognition to big data, and who knows what else it could do yet. However, there are a few problems with this system, and it wouldn’t be fair if I didn’t acknowledge at least a few of them.
Firstly, neural networks are the ultimate ‘black boxes’, each one is grown and trained uniquely, and cannot be altered once its created; they are honestly more ‘born’ than made. As such, accurate performance is not just a whole heap of skill, but also a little bit luck; just like making a real kid. However, this is as much a benefit as it is a weakness, because unlike children, neural networks are modular, so if you get access to better data, it is trivial to build a ‘better-informed’ neural net and upgrade your machine’s dumb old dumb brain with a new shinier model; just replace your current black box with a new one.
The other argument is one a friend of mine brought up recently is the argument between causation vs correlation; in that I might just be seeing what I want to see, and not an actual picture of reality. I’ll admit it caught me off guard, and now I’m quietly waiting for someone smarter than me to stroll by this little little experiment, and explain to me how I’m doing it completely wrong (and take away my calculator for being so silly). To be honest though, I actually kinda of hope that might happen, because not only is it helpful to learn from someone who knows more than I do, but also because failure is usually a pretty good teacher.
Still though, ANNs (Artifical Neural Networks) are a new technology, and there are many arguments both for, and against them. In this paper comparing neural networks and multiple regression analysis, they found that ANN models occasionally achieve a better fit and forecast than regression, namely because of their ability to catch sophisticated non-linear integrating effects; or to put it simply, they can see overlaps in the data that regression analysis does its best to ignore. It remains to be seen if that’s actually what I’m seeing here, but if that’s the case, then perhaps my little ‘correlation box’ is not quite so impossible after all.
In any case, this was a surprisingly profitable learning experience, and I think I’m going to start pursuing these ‘personal quests’ more often. I also realized that leaving a breadcrumb trail might help other people who are also googling similar things, because I know I had a tough time pooling resources. I resolved that for this experiment at least, I’d try to document whatever I found, so it was all at least in one place.
This bit is a personal note for future me. During this little ‘quest’ of ours, you’ve managed to acquire a variety of new terms and math constructs, learn how to use completely brand new software and tie it in with programs you’re already acquainted with, and practice how to document and promote things on Internet. It’s taken roughly 17 days, on and off, sobriety permitting, to get this far. I’m not sure if anything we’ve learnt will be profitable yet, but I suppose you’ll know better than I will. Still, a remarkably fun experience; you really should try to do these more often.
To the rest of you, thanks for scrolling all the way to the end and celebrating this weird ‘birth’ with me; As a party favor, here are some of the resources I’ve used, so you can have a look at the data yourself (I’m still updating this slowly). If you have any questions, corrections, or other feedback, feel free tell me over any network you like, including reality; I enjoy feedback. I haven’t decided where exactly I’m going to go from here (I tend to get distracted easily) but I’m sure I’m not done with this technology, and I’ll be back when I’ve discovered something new.
In particular, I’d really like to see how to make a little brain that can learn all by itself.