• This was made back in March 2020 right as COVID hit. After over a year of forgetting to upload this post, I finally uploaded.
• [2021-09-06]: Added section under Related Work on League-X and League-X Evaluation.

## Background

As mentioned in the last post, there were some problems that needed to be addressed:

• Detect champions even when the champions are partially obscured
• Pings can overlap
• Boxes can too
• Lower detection of false positives / false negatives

In addition to keeping to these standards, we want to:

• Detect champions accurately (bounding boxes are correct)
• Detect all possible champions

Two key components are necessary to train the machine learning model, which will reach our expectations:

1. Image of minimap
2. Positions of the champions on the minimap

## Abstract

This blog explores existing/current work related to minimap champion detection. Prior attempts fail, being built off of limited and biased data. I will show you in this blog how to build a simpler model trained on easy-to-obtain data that outperforms the existing work by overcoming data limitations. An extensive evaluation is done to show that my method is much more robust over existing work with the ability to detect all champions.

There has been previous work that attempted to solve the problem of detecting champions on the minimap.

### DeepLeague

DeepLeague was one of the first projects at an attempt to detect champions on the minimap. On January 2018, the author, Farza, released DeepLeague that uses deep learning to detect champions on the minimap. His source code is located here.

Farza gathered data from professional games that were streamed live (back in 2018, streams sent champion data including position from a websocket). Using the positions and the image of the minimap mined from the streams, Farza was able to train a YOLO9000 model that could accurately detect champions on the minimap.

Farza's image detection example (Image source: Medium)

Farza's second image detection example (Image source: Medium)

However, Farza’s project does have some pitfalls.

1. Relies on a biased data source.
• Professionals tend to choose to play certain champions over others. Therefore, some champions may not even appear in the image dataset.
2. Data source is unstable
• What if the data source (data being streamed via websocket) is down or has been changed or removed?
• League of Legends is a game that is always being updated. New champions will be created and professional streams may take a certain amount of time off. Therefore, updates to the object detector can be delayed.
3. Only is able to detect 56 champions
4. No metrics
• There is no metric on how well the detector works. How well does the detector work overall? How well does the detector work in situations where there are overlapping champion images?
• The only metric is to look at the GIFs that Farza has posted
• Analysis of GIF images
• From the both GIF images, it seems like Shen fails to be detected frequently when another champion overlaps Shen.
• Gragas from the first image doesn’t have fully centered bounding box at times, especially when the champion is moving.
• Cho’Gath from the second image is incorrectly labeled as Karma because professionals do not play Cho’Gath often, so Cho’Gath appears rarely in the dataset. Farza also mentions that this as a problem as well.
• I cannot “evaluate” how well the detector works when pings overlap champions because the GIFs do not show an example.

### PandaScore

PandaScore improved on Farza’s work and pushed their work into production as an endpoint for their service for customers to be able to request data from. In their blog, they trained an initial model, much like Farza’s approach. However, with a higher bar to raise, they wanted a model that could generalize to any champion.

The approach they had boiled to:

• Splitting up the minimap into tiles (So, for example the 512x512 size minimap could be cut into 26x26 squares)
• For each tile, use a classifier to determine what champion is in the tile (or if there isn’t a champion at all)
• The classifier was built using a GAN (Generative Adversarial Network), which helps create a better classifier by modifying the tile with noise (manipulating pixels) to be able to detect champions in situation where the other images partially overlap the champion image.
• Use another model Convolutional LSTM, which allows the program to memorize and predict the champion’s position even if the champion image has been entirely covered by another image.

Training the GAN to generate images of Ahri to be fed into the classifer (Image source: Medium)

Their approach seems to work pretty well at first glance.

It handles overlaps such as pings and always detects champions.

Pandascore's image generation example (Image source: Medium)

However, it does have pitfalls as well.

1. Over-engineering
• This approach is complex: splitting the minimap into grids, running a classifer per tile and then running the Convolutional LSTM model to generate the coordinates. Farza managed to run a single object detection system to obtain champion positions.
2. Relies on a data source
3. Not really “accurate”
• 95% of all detections are within a 20 pixel radius of the target
• The statement is may be true, but do we really want positions to be constantly offseted from where the icons actually are? 20 pixel radius can be considered a pretty big range on a 512 x 512 minimap.
• Upon closer inspection of the gif, it seems like most of the position of champions are not centered, especially when the champion moves (the detection starts to lag behind).
4. No metrics
• Analysis of GIF image
• In many cases, many of the detected champion’s positions are not centered and even seem to drift as champions move.
• The detected positions for Camille seem to drag behind her as she moves to the upper left of the minimap.
• Even when stationary and unobstructed of any overlapping images, Swain ’s position does not get detected at the center.
• When Camille moves rapidly in different directions near Ornn , her detected position seems locked in place. The same occurs for the four champions in the bottom right hand of the minimap.

### League-X

League-X is a project on github that focuses on detecting champions on the minimap (has no blog explanation unlike above).

Demo of League-X (Image source: Github)

Reading the source code, the workflow is as follows:

Frame where kayle is incorrectly detected as a ping and ezreal is detected as lucian (Image source: Github)

## Recap

The problems mentioned in previous approachs:

• Depends on existing data
• Cannot handle overlapping images well
• Cannot detect all champions
• Does not detect champion positions accurately
• Over-engineered

And most unassuring of all:

• No evaluation

## Setup

To tackle the problems, I want my approach to do the following:

• Generate my own data
• Fixes
• Depends existing data
• Cannot detect all champions
• Cannot handle overlapping images well
• Use a single model to detect all champions on the minimap
• Fixes
• Does not detect champion positions accurately
• Over-engineering with machine learning techniques

Note this focuses on league patch 10.7. Future versions of the game can be different – map layout, champions, etc…

### Generating data

What do you mean generate your own data? You need people playing a game, right?

A trick I came up with is to simply generate a possible image of the minimap that mimics a real scenario as close as possible.

Minimap generation

At first glance, it may seem like a image of a mininmap from a real game, but upon closer inspection, one can see that champions are out of place, some lane minion locations does not make sense and pings are oddly placed.

Below is a walkthrough of the pipeline for generating the images:

Base minimap

Add minion camps that may spawn

Add lane minions to the three lanes

Place wards randomly

Hopefully, this is what the detector can do in the end!

In addition, to be able to handle detect overlapping champions well, the image generator frequently attempts to overlap champions partially with other champions, pings and lines.

Example of a generated minimap that groups champions together and overlays champions with pings and lines in the center top portion of the minimap

Now, images of a minimap can be generated infinitely! There is no need to constantly find a reliable datasource, parse information and store the results somewhere.

### Architecture

Now that there is a way to continuously generate images, we can build a system to quickly train our model.

The system will be split into two parts:

• a generator that produces images
• a model that takes in the generated images and trains on them.

The generator will fill in a thread-safe multi-producer, multi-consumer queue and the model will take from the queue. This allows the generator and the model can work (mostly) independently from each other instead of the model waiting for the images to be generated and visa-versa.

The model trains on images taken from a queue which the generator fills

Since the generator operates independently, the generator can be replicated many times to generate different images at the same time and they can fill the same queue.

Architecture with many generators filling the same queue

### Model

There are many models that can be used for object detection. Some of the more popular architectures currently include YOLOv3 and Faster R-CNN.

Let’s keep things simple. I will use the Faster R-CNN ResNet50 model used in the pytorch object detection finetuning tutorial from official pytorch website. However, in this case, there are slight changes; the model will not be fine-tuned, but trained from scratch.

In 1 - Finetuning from a pretrained model, modifications are made to train the model from scratch and detect the correct number of classes.

pretrained=True needs to be set to False

num_classes needs to be set to 1 + 148, which is the background + total number of total champions currently in the game.

Result code:

## Evaluation/Results

### Image generation

On an Intel i5 CPU with 4 cores with a SSD, image generation takes on average about 0.066802 seconds per generated image with a single thread. With four threads running, it is about 0.033279 seconds per image. That does not sound not too bad, a 2x increase in speed!

Why not 4x since there are four threads and the CPU has 4 cores? Time to investigate. Hmm, perhaps the queue may be the issue since it is the only object that is shared between threads? The measured average time per thread to put an image inside the queue is 0.000049402 seconds, so the queue does not seem to be the issue. After much thought, the bottleneck probably must come from the Global Interpreter Lock, which prevents multiple threads from executing the same python code at the same time. Anyways, I probably should not discuss it further.

Oops, oh well. I used threading instead of multiprocessing, but, at least I had a two fold increase in overall speed by using threads.

Code to calculate average time spent for single thread (1000 images):

Code to calculate average time spent for four threads (1000 images):

The model training speed is another factor to evaluate.

The average time per iteration (image generation + model training) is 1.10322 seconds (calculated dividing total time spent training by the number of iterations).

With image production being so quick, model training now becomes the bottleneck by taking 0.83699 seconds per iteration and 0.26623 seconds for generating eight images in one iteration.

This is calculated by subtracting the average time spent per iteration from the time it takes to generate one image multipled by the number of images necessary for a model iteration.

0.033279 * 8 = 0.26623 where 8 is the number of images generated in one iteration. The final number is the average time spent per iteration generating images.

1.10322 - 0.26623 = 0.83699. Subtract the total average time per iteration by the time spent generating images and the result is the time spent by the model updating its weights.

Pie chart of time spent per iteration

### Model

Loss graph of training model:

Overall loss while training

Here are tensorboard graphs of five trained models (“Latest” is the model I am refering in this post)

The model was trained for 8 hours.

As one can see, the model’s loss is decreasing, which means it is hopefully learning something!

Let’s check if it actually learned to detect champion positions.

Looking pretty good!

Kindred is still detected even when overlapped almost completely by Rakan . It may be hard to notice, but there are two boxes containing Rakan and Kindred in the image with predictions.

#### How well does the model perform in a game in real-time?

Running replay at 4x speed with the object detector at a 0.6 classifier threshold

However, evaluating an object detector by observing an image or a GIF is not a sufficient metric.

#### Metrics

I hand labeled 150 images, resulting in 1213 champions labeled. I used labelImg as a tool to label the images. Each of these images have a corresponding bounding box and label. Labeling took a couple of hours.

One question might ask: Why not use Farza’s dataset of over 100,000 labeled images instead?

Farza’s dataset does not have accurate bounding boxes. The bounding boxes do not tightly fit the bounding box of tha champion icon. In addition, I cannot make sure that the dataset has errors or the labeling is up to standard since I did not label the images myself and the labels were automatically generated.

Some background terms before moving forward:

• IoU - Intersection over Union is the overlap between the predicted bounding box against the ground truth. The higher the IoU, the closer predicted bounding box is to the ground truth bounding box. The value ranges from 0.0 to 1.0.

Intersection over Union example (Image source: TowardsDataScience)

• Score - This is an output from the object detector, which measures how confident what the object the object detector has predicted. Basically, how sure am I that the bounding box contains an picture of, say, Ezreal ? This value ranges from 0.0 to 1.0.

Graphs:

The rows is listed in the order of: true positives, false positives, false negatives, extra champions detected, precision, recall, accuracy iou, and accuracy in 5 pixels.

• True positives - The number of champions detected out of all ground truth champions in the images.
• False positives - The number of champions detected with an IoU less than the IoU threshold. This is based on the most popular object detection metrics repository on github
• False negatives - The number of champions that were not detected in the images, but the champions actually were in the ground truth images
• Extra Champions Detected - Champions that were detected, but were not in the image. This can count as a false positive, but the top two most popular object detection metrics repository calculate a false positive only on IoU. Here and here
• Precision - How accurate were the predictions? This is measured as true positives / (true positives + false positives)
• Recall - How well can the objects be detected? This is measured as true positives / (true positives + false negatives)
• Accuracy IoU - The percentage of the number of champions detected out of all ground truth champions that have an IoU over the threshold.
• Accuracy in 5 pixels - The percentage of the number of champions detected out of all ground truth champions that have a bounding box within 5 pixels. This means that the four points constructing a predicted bounding box is within 5 pixels of the ground truth bounding box’s four points.

The left column measures each attribute as score threshold increases by 0.05 from 0.0 to 1.0. Remember that score threshold measures how confident the classifer predicts that the object in the bounding box is a certain champion. The IoU is set to 0.5.

The right column measures each attribute as IoU threshold increases by 0.05 from 0.0 to 1.0. Remember that IoU threshold measures close the predicted bounding box is to the ground truth bounding box. The score threshold is set to 0.5.

There are some interesting points to take away from the graphs.

Score threshold graphs:

• True positives - Slowly decreases as score threshold increases because the object detector has to be more confident in its predictions
• False positives - Does not change. IoU is not changing, so number of false positives does not change. Since false positives is related to precision, precision does not change as well.
• False negatives - Slowly increases as score threshold increases. Same reason as in True positives. Less champions are detected because the object detector is less confident in some cases. This affects the recall graph as well, which exhibits a similar result
• Extra champions detected - Decreases as score threshold increases like a log graph. As the object detector becomes more confident in its classifications (score increasing), it has to be more sure that the champion in the bounding box is actually the correct champion.
• Accuracy IoU / Accuracy in 5 pixels - Both decrease as score threshold increase, like the graph for true positives. What is interesting is the Accuracy in 5 pixels’ values. On average, the bounding boxes are in 5 pixel range 80% of the time.

IoU threshold graphs:

• True positives / precision / recall - Decreases rapidly at the end. It seems like IoU makes a difference dramatically when IoU is high. This is good as it means that the object detector makes a lot of bounding box predictions close to the ground truth bounding box.
• False negatives / Extra champions detected / Accuracy in 5 pixels - Do not change. These attributes are not affected by IoU because they only depend on whether or not the object is detected (not dependent on IoU).

A standard metric in object detection is AP/mAP (Average Precision/Mean Average Precision). Essentially, the value corresponds to the area under the precision-recall graph. The higher the value, the better the object detector is. Why? If the area under the precision-recall graph is large, it means that precision stays high as recall increases. AP can be measured at different IoUs, which I will show and analyze. The AP formula I will be using is here, which is a python adaptation of PASCAL VOC2012 challenge AP formula. The AP calculation is done by sampling points from the precision-recall graph.

The left column is precision-recall graph before the AP calculation is done. The right is the graph after the AP calculation is done. As one can see, the AP calculation samples from the graph to perform its calculation and as a result, “smooths” out the graph.

#### AP calculation per IoU

IoU AP
0.50 0.999
0.55 0.993
0.60 0.961
0.65 0.877
0.70 0.693
0.75 0.452

One interesting point to notice is that AP dramatically decreases from 0.65 to 0.75 IoU thresholds. The AP at 0.75 IoU is substantially low, considering that the AP for 0.75 IoU for the COCO dataset (detection for common objects – humans, cars, bags, chairs, etc…) is 0.585 (at this time of writing). The COCO dataset is a substainially harder dataset, so why is the AP so low for detecting champions?

I suspect the issue is due to my image detection size. The images I feed are 256x256 pixels in size and champions are, on average, 25x25 pixels in size. This means that IoU values can change in huge intervals when bounding boxes shift by one pixel.

In addition, ground truth bounding box labels are tight while the object detection system bounding boxes are larger. Because the predicted bounding boxes are usually larger than ground truth bounding boxes, there is an increasingly smaller margin of error as the IoU threshold increases.

Left is ground truth bounding box for ezreal. Right is predicted bounding box. Notice how much larger the bounding box is for the predicted bounding box.

Another interesting point to bring up is the 0.90 - 0.99 AP range for 0.50 - 0.60 IoU. The AP is extremely high when the IoU is in these ranges. The AP should be higher for higher IoU thresholds.

I suspect that if the image size were to be larger, say 512x512, the AP at higher IoU thresholds such as at 0.75 would be higher.

### Comparison as a metric

Graphs and numbers are great on their own, but a comparison would greatly benefit a person’s understanding in how well a system performs. For example, the speed of reading 1MB from memory averages around (250,000ns). How does one interpret the speed from just a number? By knowing the speed of reading 1MB from disk (20,000,000ns), it really gives a comparison to how much faster memory access is than disk access is (~80x faster).

Therefore, I am going to use Farza’s object detector as a comparison. I would compare Pandascore’s object detector as well, but the company has not released their work to the public.

I will perform the same evaluation techniques to his object detector and analyze how both object detectors compare. Farza feeds in 295x295 pixel images, which get rescaled to 416x416 pixels. My images are 256x256 pixels in size and are rescaled as well. Do note that some champion images have changed such as Fiddlesticks or have been introduced such as Sett since Farza’s release, so I have only considered the champions he has included in his dataset (56 champions) when detecting champions in my images. Here is the modified version of DeepLeague that does the evaluation.

To reiterate, the order of the rows is: true positives, false positives, false negatives, extra champions detected, precision, recall, accuracy iou, and accuracy in 5 pixels.

The left column measures each attribute as score threshold increases by 0.05 from 0.0 to 1.0. Remember that score threshold measures how confident the classifer predicts that the object in the bounding box is a certain champion. The IoU is set to 0.5.

The right column measures each attribute as IoU threshold increases by 0.05 from 0.0 to 1.0. Remember that IoU threshold measures how close the predicted bounding box is to the ground truth bounding box. The score threshold is set to 0.5.

Some keypoints to mention:

Score threshold graphs:

• True positives - Extremely low, being only able to capture 1/6 of the ground truth champions.
• False negatives - Increases expotentially as score threshold increases. My object detector increases linearly as score threshold increases.
• Extra champions detected - Interesting plateau in the beginning. The score threshold does not affect the graph until 0.3.
• Accuracy IoU / Accuracy in 5 pixels - Extremely low. Accuracy in 5 pixels is almost zero, meaning that most of the bounding boxes most likely are not in a 5 pixel threshold.

IoU threshold graphs:

• True positives - True positives seem to be a lot higher at lower IoU thresholds. This means that his bounding boxes varies a lot larger than my object detector’s predicted bounding boxes.
• False positives - Has a similar curve to my graph, which is interesting.
• Precision / Recall / Accuracy IoU - Starts at a certain point and decreases linearly. My graphs stay at a certain point and then decreases. IoU values start to only affect my model over 0.6 while all IoU values affect his performance.

Again, the left column is precision-recall graph before the AP calculation is done. The right is the graph after the AP calculation is done.

#### AP calculation per IoU

IoU AP
0.50 0.200
0.55 0.169
0.60 0.166
0.65 0.155
0.70 0.133
0.75 0.119

The AP of Farza’s object detector at different IoUs are low. My object detector has ~4x his AP at 0.75 IoU (0.452 AP vs 0.119 AP). The difference is even more noticable at 0.50 IoU (0.999 AP vs 0.2 AP).

### Comparison conclusion

My object detector significantly outperforms Farza’s object detector. The graphs and AP show clear differences between the performance of the two object detectors.

Some images show the difference in detail (left is Farza’s output, right is my output):

A lot of champions are detected as jayce . Seems like Elise , Jarvan , and Galio are detected with high scores (caused by Farza’s unbalanced dataset where these champions appear frequently)

None of the champions detected are correct except for Graves

In the top image, Renekton is not detected, but Renekton is detected in the bottom image. Renekton is not overlapped by anything in the top image. Looks like Farza’s object detector has trouble detecting champions in the bottom left corner where Ezreal is. In addition, Farza’s object detector seems to have trouble detecting Ekko when Ekko is obscured by Irelia in the top image.

### Applying metrics

Metrics are not just there for show and tell! One can use them to optimize their system to get the best performance possible. Farza defaults his score threshold to 0.3. From the graphs, the threshold lands at a point right before the number of false negatives starts to rise and and right before the number of extra champions detected starts to decrease. Perhaps, he may have wanted to maximize recall.

What is the optimal score threshold that one should use? Well, it depends. Look at the extra champions detected graph and the false negatives graph. There is a trade-off. If one wants to be have less extra champions being detected, raise the score threshold. Else, lower the score threshold to have less false negatives.

Since the extra champions detected graph models like a log graph unlike a linear graph in score threshold, one should look more closely between the graphs to choose a value. Perhaps, find a point in the extra champions detected graph that has a slope that closely matches the slope in the false negatives graph to strike a good balance? 0.6 seems to be a good value where extra champions detected has a slope of -180 and false negatives has a slope of 160. (Note that the slope of extra champions detected is negative because the values are decreasing while the slope of false negatives is increasing because the values are increasing)

## Conclusion

Going back to the problem list, this approach manages to:

• Be data source independent
• Handles overlapping images well
• Detect all champions
• Detects champion positions accurately (in a 5 pixel threshold)
• Simple (Model was built using two line change from a pytorch tutorial)

and provided:

• Metrics to show that it works well
• Performs significantly better than another system

### Discussion and Future Work

#### Why does the system not handle entirely covered champion images?

It simply does not need to. One can assume that champions that are not detected are still at their previous location because most of the time, the champion is mostly or fully obscured for a couple of frames. In the bot lane, one can observe that this occurs often, but the champions stay within close promixty in their position most of the time.

If it is necessary to approximate champion positions, one method is to use a Kalman Filter to produce estimates of the current positions using detected previous positions.

#### Why use FasterRCNN? Why not YOLO?

It was used in the pytorch object detection tutorial and was the simplest to adapt. Other models definitely can used in place of FasterRCNN such as YOLO (speed, model size).

Yuumi is a champion that is able to attach to other champions, as shown below.

On the left, Ezreal and Yuumi are separated, but on the right, Yuumi attaches to ezreal (as shown by a special red border).

As one can see, since yuumi disappears, the object detector cannot pick up on the fact that yuumi champion icon is not on the minimap anymore. The question then becomes: how do I detect yuumi when she attaches to another champion?

There are a couple of ideas I have in mind. One approach is to use classical computer vision techniques to detect champions with the special border (thus figuring out that yuumi exists). Another approch is to add a copy of all champions in league, but with yuumi added, so that the object detector can detect a champion with yuumi attached. So, for example, Ezreal with Yuumi would be a class that would be distinct from Ezreal. However, this approach doubles the amount of classes for the object detector to handle.

#### What about champions that can clone themselves?

Example of such champions that can clone themselves include Leblanc , Wukong , Neeko and Shaco . When these champions uses an ability to clone themselves, another champion icon of itself appears on the minimap, as shown below.

Wukong clones himself, creating two copies of Wukong on the minimap

In some cases, there can be assumptions made to detect the actual champions in these scenarios. For example, Neeko’s clone moves in a straight line and Wukong’s clone stays still. However, there is not a perfect solution to solve these cases. Players are able to use the champions’ clone ability to trick other players into thinking the real champion is a clone. Players can also mimic what their clones perform by mimicing the clones’ actions. For example, Neeko can move in a straight line like its clone and Wukong can stay still alongside its clone. Thus, simple algorithms simply detecting clone movements can produce a significant amount of false positives.

#### Why use CNNs when you can use something less complex?

This was primarly a learning experience for myself; The goal was not meant to be a perfect end-to-end solution. There are definitely other better ways to handle problem like the previous post with computer vision techniques (using way less resources, which is incredibly important in running the program alongside a live game).

#### Why use Faster R-CNN ResNet50 when you can use something less complex?

Yup. This is a huge model. Why use it? Cause it was part of the torchvision tutorial and was exteremely easy to modify to get working. Could definitely use something way less complex as a model. For example, yolo :)

#### Where is the evaluation on FPS?

Approximately around 20-30 FPS on my machine (4 cores, 2060) playing a game in real time. Quite important factor I left out. I could have dove deep into this and plotted some nice graph comparisons, but this blog is already long as it is. A suggestion for higher frames is to slowdown the footage or replay (0.5x). If you’re looking to run ingame, I suggest you change the model (as I stated above).

#### League-X Evaluation

There is no comparison with League-X because the evaluation framework cannot be easily adapted to League-X and I do not want to put more effort into evaluations anymore (I found League-X after I had written the entire post already). It does have some of the issues like the other projects where not all champions are in the dataset and in particular, a very limited training set for champions, so the results should be somewhat similar to the evaluation of DeepLeague.

#### Source

The source is published here

#### End

Thanks for reading this post. As always, feel free to reach out to me on twitter or comment below if you have any questions!

Special thanks to Liam Dugan and Justin Shaw for giving advice, reviewing this post and providing fantastic feedback! :)