Pages

Wednesday, September 7, 2016

Content-based Game Classification with Deep Convolutional Neural Network



One thousand video clips of one hundred games. The games are clustered according to their t-SNE components applied on the output of a CNN trained to classify RTS games. You can interact with the interface here.
video
The figure above generated in fast motion.
The figure above is from an interactive demo for this article that you can find here. I recommend that you have a look at it before continue reading.

Introduction

A while ago, I started working on convolutional neural network within the computer game domain. I was particularly interested in expanding their success to video games and investigate whether they can be used to learn features about games similar to what they do with images and videos in many other areas. In this post, I will explain what I did so far and I will show some of the recent results.

Goal

There has been a lot of work recently on video classification, tagging and labelling. My interest lies in bringing these ideas to games. My hypothesis is that video game trailers and gameplay videos provide rich information about the games in terms of visual appearance and game mechanics that would allow CNNs to detect similarities along a number of dimensions by "watching" short video clips.

Gameplay 2M dataset

As you already know, CNNs are data hungry, so I started by collecting the data I need. I was looking for videos of gameplay classified according to a number of categories. The easiest way I found to collect the data is to prepare a list of game titles, download YouTube videos of gameplay form different channels and associate each game with a set of categories I eventually got form Steam.

So, I initialised the process and I started running experiments when I had data for 200 games ready. For each game, I downloaded 10 gameplay video. Since those vary in length, I cropped a 5-minute segment from each of them. Then for each segment, I randomly sampled 10 half-second shorter clips. Finally, from these short clips I extract 100 frames. If you do the calculation, you will see that I ended up with 100*10*10 = 10000 gameplay images per game, so the dataset I will be using for this post contains 2M gameplay images.

As for the game classes, I query Stream on categories assigned to each game by the users. I ended up with a 24-D vector of categories including whether the game is an action, single-player, real-time strategy, platformer, indie, first-person shooter, etc. Each game is assigned to one or more of these categories. To create one category vector for each game, I averaged them per category and used a simple step function with a threshold of 0.5 and assign the final vector to each game (more specifically, to each image).

Here are some short clips from some of games I used for training and the categories they belong to according to Steam users:

video
Full Spectrum Worrier: RTS = 1, Action = 1, Single-player = 0

video
Empire Total War: RTS = 1, Action = 1, Single-player = 1

video
Team Fortress: RTS = 0, Action = 0, Single-player = 1

Method

Most of recent work in deep learning rely on established state-of-the-art models and fine tune it on a new dataset. I follow this stream of work as training from scratch is very time and resource consuming. Some state-of-the-art CNNs are very good in extracting visual feature representations from raw pixel data. In my work, I use the convolutional layers of the VGG-16 model to extract generic descriptors from the gameplay images.

I train on static images of gameplay extracted from the videos (I believe adding temporal information will improve the results, but I wanted to start simple and build from there). I built classifiers for only the three categories: RTS games, action games and single-player games as those provided the most balanced data in terms of belonging to positive and negative classes but I will be running more experiments once I have more data. 

To build the classifiers, I first pass all images through the convolution layers of the popular VGG-16 model to extract the visual feature descriptors that I later use to train NN classifiers. Each classifier constitutes of the convolutional layers from VGG-16 then two dense layers of 512 nodes each. Finally, I use a sigmoid function that output the probability of an image belonging to a class.

I trained three binary classifiers to learn each category independently (I could as well have used other multilabel learning methods but this is what I use for now). I split the data into three sets for training (70%), validating (20%) and testing (10%).


VGG-16 artitecture with two dense layers of 512 nodes each. 

Analysis, how good are the classifiers?

The three classifiers performed remarkably well in terms of classification accuracy. I got accuracy up to 85% when classifying action games on the image level and the results for RTS and single-player games were slightly lower reaching 0.76% and 0.72%.  I also calculated the accuracies in other settings where I average the performance per 0.5-sec clips, 0.5-min clips and per game.  In some cases, it seems that looking at multiple images will indeed increase the accuracy while in others (when classifying action games), the model was just as accurate on individual images as it is on the whole game.

Following some inspiring work (here and here), I further looked at the distribution of the classes according to the first two t-SNE components (performed on the PCA results of the output of the first dense layer of the classifiers). I did this for a sample of the dataset (neither my machine nor t-SNE has enough power to process the whole dataset) and you can clearly see the classification boundary between positive and negative samples on the 5-min clips. 

t-SNE visualisation of the distribution of 15000
half-second clips classified by the RTS classifier. 
t-SNE visualisation of the distribution of 15000
half-sec clips classified by the single-player classifier.

I also looked at the distribution of games as I thought this is particularly interesting because the network has no explicit information during training that specifies from what game the images come from (it only knows whether an image is from a particular class or not). If my genetic image descriptors are powerful enough, I expected images/clips of the same game to cluster together. So I regenerated the same figures as above, but this time the colour code I used was game titles so that images or clips belonging to the same game will be given the same colour.
Same figure as above but points are coloured
by game title (RTS classifier).

Same figure as above but points are coloured
by game title (
Single-player classifier).

You can clearly see some cluster of clips belonging to the same game preserved quite well. This is a really interesting finding as it seems that somehow the models learned an implicit representation of the games although they didn't really trained to recognise them.

This last finding meant that games with similar visual features according to a given category should also be projected close to each other. So this time, I visualised the distribution of 5-min clips from the RTS classifier while showing the title of the games. Here is how the figure looks like with some zoom-ins.

Some zoom-ins from the t-SNE distribution of the output of the RTS classifiers.

Analysis, how different is the data?

Of course some videos are better representative of a game than others and therefore I expect to get variations in accuracies on the images and videos levels. To give you an idea of how the accuracy changes per image, here are some of the results from the action-games classifier for seven games. The performance is clearly different among games but there are also clear fluctuations within the same game. For some games, such as Hexen II and Team Fortress (number one and five in the figure) you can confidently tell by looking at the graph that they have a strong action element.
Accuracy per image by the action game classifier for seven games. 
So, why do some images give high accuracies while others don't. What is it that the network is interested in? Since I'm using a pre-trained models for visual feature extraction, visualising the convolutional layers won't really help. I instead looked at the individual images with high and low accuracies for some games. Here is an example from the game Hexen II when the classifier is trained to see it as an action game.
Accuracy per image for the game Hexen II by the classifier of action games.
What I can tell for now (from these snapshots and many others I visualised), is that the amount of lighting matters quite a lot, the more the light, the higher the action. Similar analysis in RTS games showed that panels such as these below, even when only partially shown, are what contribute the most to recognising games as RTS.



For some videos, the models are more confused. This happens a lot when the category classified is a minor feature of the game and not one of its main characteristics. This in fact is the main reason I prefer to use a sigmoid function as an output for the classifiers. I can then interpret the output in a probabilistic form and say that a low probability translates to showing a small amount of a specific feature. This allows me to better understand the games and means I can define a similarity function on these vectors to find out what games are similar to each other and in what aspects, but more in that in the future.

Finally, some snapshots from the demo you saw at the top of the page. Here, I tried to visualise the five-minute clips according to their t-SNE dimensions. Since I only care about their clusters, and not their exact position in the space, I calculated the distance between all of them and connect each node to 10 of its nearest neighbours. To make it easier to understand the graph, I also gave the nodes belonging to the same game the same colour. If you zoom-in you can see the titles of the games and what games are connected to each other. The figures below are from the results of the RTS classifier.


Now this certainly doesn't allow me yet to draw conclusions on what and how games are similar but I believe that with more data and classification of more dimensionalities, we can build a powerful tool for automatic content-based classification of games.

This work is done in collaboration with Mohammed Abou-Zleikha.



Tuesday, August 9, 2016

Summary: How to Start a Startup: Lecture 1 (by Y-Combinators)

A year ago, I get an idea for an app that I believe can help improve parents' life by making it easier for them to connect with old friends and make new ones. I named it Menura and as I started working on it, I wanted to learn about the process of starting a new business and building a network. So I attended some events in Copenhagen, where I live, and I met some great people. One of them is David Helgason, the founder and former CEO of Unity. We talked about best practices when starting and the best resources to learn from. We mentioned reading books, meeting people among some other things, but the one thing he highly recommended was that I go and listen to the "How to start a startup" lectures by Sam Altman, the president of Y-Combinators, and so I did.

The series contain 20 lectures of about 45 min length each and were presented initially at Stanford University in 2014. Sam brought together a great group of experienced and successful people to talk and share lessons from their own experience starting (now some million worth) startups. Speakers include for instance, Peter Thiel, known as the co-founder of PayPal, Reid Hoffman, co-founder of LinkedIn, Aaron Levie, co-founder of Box, Ben Silbermann, co-founder and CEO of Pinterest, and Paul Graham, the co-founder of Y-Combinators.

I listened to the lectures while Menura was in its early stages, and I enjoyed and learned a lot from everyone of them. But now that I'm almost done with the development phase that I need to execute the following steps, I feel like I don't recall many of the details in the lectures. So I decided to listen to them once more but this time, I decided to take notes of the important points to keep them as a reference for the future. I will be sharing my notes so anyone interested can benefit from them. Note however, that these are my personal notes which mean they are subjective and you might end up focusing on other ideas if you listen to the lectures yourself (which I highly recommend). Nevertheless, I think they are interesting and worth sharing.

Without further due, let's get started!
Lecture 1: How to Start a Startup
To start a successful startup, you need to excel in four main areas:
1.     Idea: 
o   Execution is harder and 10 times more important
o   Bad ideas are still bad (even with great execution)
o   Think long term
o   Should be difficult to replicate
o   Needs critical evaluation that includes
§  Market: size, growth
§  Company: growth strategy
§  ...
2.    Product
3.    Team
4.    Execution
Where success = idea * product * team * execution * (w* luck)
where w is a number in the range [0,10000] and what is nice about it is that it is somehow controllable :).

Starting a startup is really hard:
1.     Do not do it to become rich (there are easier ways)
2.    Do it if you have a solution to a problem
3.    Ideas first and startup second
4.    The good idea is the one you think about frequently when not working
You should focus on a mission-oriented startup:
  • You are committed = you love what you are doing
  • You have a great patience: startups take about 10 years
Good ideas are unpopular but right:
  • You can practice identifying them
  • They look terrible at the beginning
  • Start with a small market to create a monopoly and then expand
  • You will sound crazy but be right
  • Look for an evolving market (big in 10 years)
  • The market is better when it is small and growing rapidly, it means the market is more tolerant and hungry for a solution
  • You can change everything but the market
  • Answer why now?
  • To build something you yourself need is better to understand the problem
  • The idea should be explainable in one sentence
  • Think about the market (what people want) first
Good practice:
  • Be confident
  • Stay away from nay-sayers (most people if it is a good idea)
Good Product is something users love:
  • Until you build it, nothing else matter
  • Spend your time building and talking to customers
  • Marketing is easy when you have a great product
  • Better to build something a small number of users love than to build something a large number of users like. Easy to expand from there
  • Find a small set of users and make them love what you are doing. 
  • Build a product that is so good that it will grow by the word-of-mouth
  • Most companies dies because they didn't make something users love, not because of competition
  • Start with something simple (I like what Leonardo Da Vinci said about this "simplicity is the ultimate sophistication" and Steve Jobs' famous quote "Simple can be harder than complex")
  • Quality and small details matter
  • Be there for your customers (even at midnight)
  • Recruit feedback users by hand (this is the stage where I am at with Menura right now and I can't tell you how hard it is, you should literally send personal emails and messages to every single one of you potential interested users and you should keep the conversation going.)
  • Do not do ads to get initial users, you don't need many, you need committed ones
  • Loop from feedback to product decisions by asking the users:
    • What they like/dislike
    • would they recommend it to others
    • have they recommend it already
    • what features would they pay for
  • Make the feedback loop as tight as possible for rapid progress
  • Do it yourself (that includes everything, from development to marketing to customer support...)
  • Startups are build on growth, so monitor it
  • If this (your product) is not right, nothing else will matter
Discussion about team and execution is left to the next lecture and we will now move on to answer the most important question (in my opinion): why you should start a startup?
Why you should start a startup?
Probably you thought about it as being Glamorous (you will be the boss, it has attractive flexibility, you will be making impact and $$). In reality, however, it is a lot of hard work and it is pretty stressful. Here is why:

You will be:
1.     having a lot of responsibility
2.    always on call
3.    taking care of fundraising
4.    gathering media attention (not always what you like to see)
5.    strongly committed  
6.    managing your own psychology 
And here is a more elaborate explanation of what you might think is attractive about it:
1.     Being the boss: not really true (you will be listening and executing everyone else needs and feedback)
2.    Flexibility: also not true as you will be always on call, you are the role model, you are always working 
3.    Having more impact and more $$: you might actually make more money joining Facebook or Dropbox, and you get to work with a team so you might end up making more impact
After some thought provoking points, it is now time to find out the real reason you should have to start your own startup. It is actually pretty obvious: you simply 
"can't not do it"
This means:
1.     You are passionate about it
2.    You are the right person
3.    You gotta make it happen
4.    You can't stop working on it
5.    You will force yourself into the world to achieve your vision
Pretty nice and thoughtful introduction. Now that was the end of the first lecture and the finishing slide was recommendations for some book. 




I have personally started reading Zero to One I'm really enjoying it (the rest are on my reading list, which is growing very fast :)). I will probably share some summaries about in another blog, but that's it for now.

Main takeaways:
(These are the main points that stick into my mind after listening to the whole lecture)
1.     Ideas are important but execution is vital
2.    Make something people love
3.    A small number of people loving your product is more important than a large number liking it
4.    Get your product right and everything else will follow smoothly from there
5.    Build your own product-lover small community and rely on the power of WoM
6.    If starting a startup is the thing you can't do without, then you are on the right track (good luck, enjoy the journey!). Otherwise, join one of the great companies.
See you in the next lecture :)!




  

Thursday, July 14, 2016

How Writing is Similar to Drawing

I like drawing, it has always been my hobby since I was a little girl. Somehow I grew up and I absorbed by the busy life that I didn't have time to draw any more. Recently, I missed it so much, and had a number of motivations, that I started drawing again, from the basics this time. As a grown up, I'm enjoying this process even more as I find it quite rewarding. It is a nice way to relax and let the ideas flow into my head. Many of my "great" (because they are mine :)) ideas come while I'm drawing.

So, starting from the basics, the first thing I learnt about is to start with a simple sketch highlighting the main features and proportions of the drawing. For instance, if I'm going to draw a human in specific pose from a specific perspective, I should start with basic shapes, usually lines and spheres depicting where each part should be placed.

Simple sketching while learning the basics

The next step would be to do another scan and add more details about the main shapes such as the shoulders, the chest, the arms and the legs. I then refine the drawing once more to add more features about the clothes, the face and the hair. Finally, I put in the final touches on the small details of the face, hair style and cloths.

Sketches about different poses

Starting with a sketch helps a lot with making sure the final drawing makes sense. Otherwise, it is very likely that I will end up with wrong proportions and wired looking gesture even if the details are good.

As a researcher, a big part of my job requires writing. Whether papers, articles, book chapters and blogs. Recently, while writing an article, it occurs to me that writing a good article is similar in many ways to making a good drawing. Meaning, if I have an idea of what I would like to write about, then a good article should start with a sketch about the main topics I will be covering and the length I should span in each (the lines and circles in a drawing). This ensures I don't go off-topic, that there is no overlap between the sections and that I don't expand in one area at the price of another. From there, the process follows quite smoothly; refining each section by adding subsections and a few points about what each covers (specifying and putting the basic shapes of a drawing in place); another scan to rewrite the points into sentences (adding more features for each shape) and finally adding more details and making sure everything connects smoothly (final touches on a drawing).

I like the idea of connecting two seemingly unrelated processes and I find it quite intriguing. I hope this realisation will help me (and you) enjoy writing and drawing even more :).

Wednesday, April 27, 2016

Ideas, ideas, and more ideas...


Here is a list of project ideas I'm interested in but unfortunately not having enough time to work on by my own, I could certainly use some help :). Some of which are projects/services while others are (what I believe to be) useful mobile apps. I can supervise/co-found or advice any of them. If you are interested in more information, drop me a line. I will keep updating the list as I find time and as new ideas come to my mind.

Misc.: 

  1. A GoodReads-like website for research papers: I like how Goodreads works and I think it would be great if we could build a similar platform specifically for research papers. I think rating and reviewing papers through a similar platform is more reliable and useful than the current citation mechanism. Usually when you read papers, you could like some while not cite them yet (which, in the current citation system, means they won’t get any credit and your friends won't know you read these papers and found them nice). I believe such reviewing and sharing system could potentially substitute Google Scholar (on the long run); papers will be evaluated by a wider audience that includes whoever reads the paper (and not only by the smaller network of people who cite it). Papers will be evaluated by the crowd eliminating some of the inherent limitations of Google Scholar especially the Matthew effect and the Vulnerability to spam (I borrowed these fancy words from Wikipedia ;-)).
  2. A collaboration platform for researchers that facilitates proposal of research ideas, discussions and collaborations on projects: Occasionally, I have interesting ideas that I would like to see implemented but I often lack the necessary expertise or knowledge in all the technologies needed to bring the idea to life. I might as well have the knowledge but lack the time to do the whole project on my own. I believe many of the researchers I know share the same experience. In the platform I propose, one could share a high level description of the idea and list the knowledge she is missing and the sort of collaboration she is looking for. Others can view, comment or start a serious discussion about possible collaboration. I believe we as researchers need to talk more to each other (and by other here, I mean researchers from other fields who we don’t usually meet in the conferences we usually go to). My hope is that such a platform could encourage collaborations among researchers who don’t usually get to talk or meet each other and foster discussions that advance research.
  3. A website with articles about the recent trends in machine learning/ data mining and AI: I know how hard it can be for Arabic-speaking student, especially in the IT field, to find useful resources even with the wealth of information on the Internet (which might also be a curse if you are new and want to navigate your way and filter what Google gives you). So, a while ago I decided to build a website where I write short articles on topics related to recent trends and techniques in machine learning and data mining. The idea is that the articles should be short, focused, easy to understand, example-oriented and in Arabic. That is because I wanted them to appeal not only to IT students, but to whoever interested in advancing her knowledge in these areas. I really want to make this a reality and I could use all the help I can get (setting up the website, typing the articles (I’m very slow in typing Arabic), or even helping with suggesting topics, writing, reviewing and putting the content together).

Mobile Apps: I don’t have the business plan ready for these apps but I would very much like to be involved or co-found any of them. Let me know if you find them interesting.

  1. A mobile app that detects the level of noise in the environment and automatically mute/reinstate the mobile phone. This adds a touch of intelligence to your mobile and is really useful in cases where you are, for instance, in a meeting or a lecture. You don't want to be disturbed but forget to put your mobile on the silence mode. The app periodically senses the level of noise, estimates where you are and adjusts the sound accordingly.

  2. A mobile app that switches between playlists according to the place and time of day: the songs you enjoy listening to in the morning are most likely different from those you listen to during your walk in the afternoon. And those are most likely different from the ones you listen to while cooking or running. A mobile app that detects your current activity and combines it with the time of the day to recommend the next song or to switch to appropriate playlist is potentially useful. (This idea is inspired by a chat with my friend Yun-Gyung Cheong).

Updates:

15-07-2016: Together with a friend we started working on point 3 in the Misc. category: a website in arabic to educate people about the latest trends in machine learning, data mining, and artificial intelligence. We named it ArLore and you can find it here: http://arlore.com/.

Saturday, April 2, 2016

Games as a testbed for research - Why?

People usually ask me about my research, when I say I'm doing research on computer games, I can feel their disapproval (though no one has actually express it out load). I can totally understand their reaction; I'm not a game developer so I'm not really making games, I'm not doing pure Artificial Intelligence (AI) so I'm actually contributing to making existing games any better and I'm not working for the industry so my work, so far, has no direct tangible influence.

I have actually been thinking about these issues for a while now and here is my attempt to clear this misunderstanding and clarify why what I'm doing is really interesting and more people should do it.

I have been working for a while (more than five years! time really flies fast) on player experience modeling (PEM) and procedural content generation (PCG), trying to come up with ways to improve and connect both. So far, I have made a progress, but it still fascinates me how little we know about human decision making and the unique ways in which people interact with digital media on one hand, and the sophistication of the process of creating games, on the other hand.

Creating games to me is very much like writing a novel. Almost everyone can write, but not everyone can come up with something interesting that others would like to read, and few can write something that appeals to a wide range of audience (something like the Harry Potter novel).

Understanding how human come up with a good story is hard, and building a program that can imitate this process is even harder (it would have been already done if it is not). Same applies for creating games. The exception is, if you want to make a good game, you can't rely on the imagination of the reader to setup the stage, you should master creating every aspect of it. Unlike novelist, game designers don't only create the story behind the game but they should also craft its visual artifacts, music, and mechanics, and that is why game creation is interesting: it combines so many creative processes. This is exactly why I personally think games are interesting as a testbed.

It has always fascinates me how people come up with great ideas and what inspires these remarkable creations. Take for instance the Harry Potter novel, do you think a computer could one day come up with something similar? I really think it is very unlikely, I actually even believe there are very few humans who can write something similar. I particularly chose Harry Potter because it is a fantasy, it is not something we experienced, seen or even imagined, and it is not something we can create with a little bit of extra effort. It took J. K. Rowling about five years to write the skeleton of the story, a process that fused life experience, great imagination and powerful writing skills. Though what actually inspired the story and the characters remains, at least to me, a big mystery.

Research has so far treated human as Gods, building machines with the sole purpose of imitating humans, but can't we take this one step further. Can't we make machine more spontaneous, more creative, more interestingly unpredictable. This requires not only imitation, but also improvisation, going beyond what you learned towards exploring the unknown. (I know some people will be freaked out by this, as it seems like I'm talking about the rise of robots, but this is not really what I'm aiming for.  What I'm talking about is a system that can understand human and effectively collaborate with her. A system with which you can share your thoughts and actively wait for inspirations. A system to which you say "surprise me" and be prepared to be surprised (in a good way :-))).

So, one of the questions I'm interested in finding an answer to is: can computers, one day, surpass humans in creating novel ideas? There have been quite a lot of success in understanding how humans perform relatively simpler activities such as vision and speech (especially recently with the huge success of deep neural network), but we are still far behind when it comes to understanding the more fundamental cognitive process such as thinking, decision making, emotions, creativity and their relationships.

Games have been the focus of attention for so many people because, let's face it, people like playing game and companies like making money. This is however has so far been a motivation for making better games: games designed by humans with the help of AI. AI is usually employed to make the game design process easier (generation of crowds, making believable non-player characters, or even adjusting the difficulty of the game so that you will play more) or to automate tasks such as planning and path finding. 

Recently, there has also been some interesting work on artificial creativity and how we can teach computers to search for novelty. The problem domain however is still limited and so is the space of actions.  Games on the other hand are worlds widely open for imagination, creativity and understanding human behaviour. I believe we are still taking our very first steps towards understanding these factors and it will take us a while before we grasp some solid knowledge about them. But for now, we have an interesting medium and plenty of unanswered questions, a great setup to start digging in.