League of Legends data scraping the hard and tedious way for fun

Before we begin

  • Most of the work was done a couple years back (during COVID), so significant parts may have changed since then

Disclaimer

This work isn’t endorsed by Riot Games and doesn’t reflect the views or opinions of Riot Games or anyone officially involved in producing or managing League of Legends. League of Legends and Riot Games are trademarks or registered trademarks of Riot Games, Inc.

Table Of Content

Overview

League of Legends is one of the world’s most popular competitive games, with millions of players generating vast amounts of gameplay data daily. Basic match statistics are available, but accessing moment-by-moment gameplay data is near impossible. This article demonstrates how to create a high-fidelity dataset by reverse engineering the game engine, capturing information such as precise player positions to ability usage timings and damage calculations.

Current League of Legends datasets and analytics tools face several limitations:

  • Granularity: Most available data comes from Riot’s public APIs, which only provide aggregated statistics (total kills, gold earned, etc.) rather than detailed gameplay events
  • Precision: Existing tools that sample game state often miss critical micro-interactions
  • Completeness: Important gameplay elements like exact ability used, hiding in fog, positions data are missing

Because of this third-party developers can only build basic tools and datasets. We haven’t seen announcements with regards to game-playing agents unlike other games where high fidelity data has enabled breakthoughs in reinforcement learning ~ OpenAI5, AlphaStar.

I built a tool that can create high fidelity datasets. It directly decrypts and processes the game’s internal replay files, but keeps the game’s native data format. It can capture precise details like ability usage, movement, and state changes at millisecond intervals and can scale efficiently.

I’ll walk through the technical challenges involved in reverse engineering League’s proprietary game engine and replay format. These methods can be applied to similar problems in data extraction.

Expand to show sample of data scraped from a game
{
  [
    {
      "CreateSummoner": {
        "time": 0.0,
        "id": 1073741859,
        "name": "SomeUserName",
        "champion": "akali"
      }
    },
    {
      "CastSpell": {
        "time": 10.23234,
        "champion_caster_id": 1073741859,
        "spell_name": "AkaliE",
        "level": 1,
        "source_position": {
          "x": 14045.15,
          "z": 13559.334
        },
        "target_position": {
          "x": 14400.824,
          "z": 14443.168
        },
        "target_end_position": {
          "x": 0.0,
          "z": 0.0
        },
        "target_ids": [],
        "windup_time": 0.25,
        "cooldown": 14.5,
        "mana_cost": 30.0,
        "slot": 2
      }
    },
    {
      "UpdateState": {
        "time": 721.11426,
        "1073741859": {
          [
            {
              "name": "health",
              "data": 1516.6107
            },
            {
              "name": "movement_speed",
              "data": 270.777
            },
          ]
        }
      }
    },
    {
      "BasicAttackAtTarget": {
        "time": 122.12,
        "source_id": 1073741859,
        "target_id": 1073741858,
        "source_position": {
          "x": 9222.389,
          "z": 2501.3594
        },
        "target_position": {
          "x": 9266.0,
          "z": 2522.0
        },
        "slot": 49,
        "spell_name": "AkaliP",
        "level": 0,
        "target_end_position": {
          "x": 0.0,
          "z": 0.0
        },
        "target_ids": [
          1073741858
        ],
        "windup_time": 1.3504388,
        "cooldown": 0.0,
        "mana_cost": 0.0
      }
    },
    {
      "CreateEntity": {
        "time": 722.9974,
        "id": 1073770044,
        "position": {
          "x": 9326.244,
          "z": 2522.7634
        },
        "name": "TwilightShroud",
        "level": 1,
      }
    },
    {
      "BecomeVisibleInFogOfWar": {
        "time": 862.4249,
        "id": 1073741859
      }
    },
    {
      "Death": {
        "time": 860.31085,
        "id": 1073741859,
      }
    }
  ]
}

Current Datasets And Their Issues

League of Legends is a popular MOBA game. With millions of games played daily, several companiesBackseat.ggblitz.ggMobalytics have invested significant effort in analyzing gameplay data to improve the player experience. However, detailed, fine-grained data remains relatively hard to obtain.

Typical data sources provide only limited, high-level information. Despite these limitations, people build basic models use this data to build basic win prediction models Kaggle or create user interfaces for gameplay analysisMobalytics.

The following section provides different ways to fetch data and what datasets exist out there. I’ll show and explain what these datasets contain.

Most existing datasets are created by scraping recent games through the official RIOT matches API. When provided with a match ID, this API returns a list of participating players and various game statistics and metrics.

Expand to show match descriptionhttps://developer.riotgames.com/apis#match-v5/GET_getMatch
Name Data Type Description
endOfGameResult string Refer to indicate if the game ended in termination.
gameCreation long Unix timestamp for when the game is created on the game server (i.e., the loading screen).
gameDuration long Prior to patch 11.20, this field returns the game length in milliseconds calculated from gameEndTimestamp - gameStartTimestamp. Post patch 11.20, this field returns the max timePlayed of any participant in the game in seconds.
gameEndTimestamp long Unix timestamp for when match ends on the game server. This timestamp can occasionally be significantly longer than when the match “ends”.
gameId long  
gameMode string Refer to the Game Constants documentation.
gameName string  
gameStartTimestamp long Unix timestamp for when match starts on the game server.
gameType string  
gameVersion string The first two parts can be used to determine the patch a game was played on.
mapId int Refer to the Game Constants documentation.
participants List[ParticipantDto]  
platformId string Platform where the match was played.
queueId int Refer to the Game Constants documentation.
teams List[TeamDto]  
tournamentCode string Tournament code used to generate the match. This field was added to match-v5 in patch 11.13 on June 23rd, 2021.
allInPings int Yellow crossed swords
assistMePings int Green flag
assists int  
baronKills int  
bountyLevel int  
champExperience int  
champLevel int  
championId int Prior to patch 11.4, on Feb 18th, 2021, this field returned invalid championIds.
championName string  
commandPings int Blue generic ping (ALT+click)
championTransform int This field is currently only utilized for Kayn’s transformations. (Legal values: 0 - None, 1 - Slayer, 2 - Assassin)
consumablesPurchased int  
challenges ChallengesDto  
damageDealtToBuildings int  
damageDealtToObjectives int  
damageDealtToTurrets int  
damageSelfMitigated int  
deaths int  
detectorWardsPlaced int  
doubleKills int  
dragonKills int  
eligibleForProgression boolean  
enemyMissingPings int Yellow questionmark
enemyVisionPings int Red eyeball
firstBloodAssist boolean  
firstBloodKill boolean  
firstTowerAssist boolean  
firstTowerKill boolean  
gameEndedInEarlySurrender boolean  
fistBumpParticipation int  
voidMonsterKill int  
abilityUses int  
acesBefore15Minutes int  
alliedJungleMonsterKills float  
baronTakedowns int  
blastConeOppositeOpponentCount int  
bountyGold int  
buffsStolen int  
completeSupportQuestInTime int  
controlWardsPlaced int  
damagePerMinute float  
damageTakenOnTeamPercentage float  
dancedWithRiftHerald int  
deathsByEnemyChamps int  
dodgeSkillShotsSmallWindow int  
doubleAces int  
dragonTakedowns int  
legendaryItemUsed List[int]  
effectiveHealAndShielding float  
elderDragonKillsWithOpposingSoul int  
elderDragonMultikills int  
enemyChampionImmobilizations int  
enemyJungleMonsterKills float  
epicMonsterKillsNearEnemyJungler int  
epicMonsterKillsWithin30SecondsOfSpawn int  
epicMonsterSteals int  
epicMonsterStolenWithoutSmite int  
firstTurretKilled int  
firstTurretKilledTime float  
flawlessAces int  
fullTeamTakedown int  
gameLength float  

These fields provide coarse, post-processed data from Riot, typically containing aggregate statistics like total kills, wards placed, and other cumulative events.

The Live Client API (LCU) offers a more detailed alternative, but requires running the game client. While it provides event timestamps (like deaths, destroyed turrets, and objective kills), it still lacks crucial data points such as player positions and ability usage timestamps.

Expand to show LCU API json examplehttps://static.developer.riotgames.com/docs/lol/liveclientdata_sample.json
{
  "activePlayer": {
    "abilities": {
      "E": {
        "abilityLevel": 0,
        "displayName": "Molten Shield",
        "id": "AnnieE",
        "rawDescription": "GeneratedTip_Spell_AnnieE_Description",
        "rawDisplayName": "GeneratedTip_Spell_AnnieE_DisplayName"
      },
      "Passive": {
        "displayName": "Pyromania",
        "id": "AnniePassive",
        "rawDescription": "GeneratedTip_Passive_AnniePassive_Description",
        "rawDisplayName": "GeneratedTip_Passive_AnniePassive_DisplayName"
      },
      "Q": {
        "abilityLevel": 0,
        "displayName": "Disintegrate",
        "id": "AnnieQ",
        "rawDescription": "GeneratedTip_Spell_AnnieQ_Description",
        "rawDisplayName": "GeneratedTip_Spell_AnnieQ_DisplayName"
      },
      "R": {
        "abilityLevel": 0,
        "displayName": "Summon: Tibbers",
        "id": "AnnieR",
        "rawDescription": "GeneratedTip_Spell_AnnieR_Description",
        "rawDisplayName": "GeneratedTip_Spell_AnnieR_DisplayName"
      },
      "W": {
        "abilityLevel": 0,
        "displayName": "Incinerate",
        "id": "AnnieW",
        "rawDescription": "GeneratedTip_Spell_AnnieW_Description",
        "rawDisplayName": "GeneratedTip_Spell_AnnieW_DisplayName"
      }
    },
    "championStats": {
      "abilityPower": 0.00000000000000,
      "armor": 0.00000000000000,
      "armorPenetrationFlat": 0.0,
      "armorPenetrationPercent": 0.0,
      "attackDamage": 0.00000000000000,
      "attackRange": 0.0,
      "attackSpeed": 0.00000000000000,
      "bonusArmorPenetrationPercent": 0.0,
      "bonusMagicPenetrationPercent": 0.0,
      "cooldownReduction": -0.00,
      "critChance": 0.0,
      "critDamage": 0.0,
      "currentHealth": 0.0,
      "healthRegenRate": 0.00000000000000,
      "lifeSteal": 0.0,
      "magicLethality": 0.0,
      "magicPenetrationFlat": 0.0,
      "magicPenetrationPercent": 0.0,
      "magicResist": 0.00000000000000,
      "maxHealth": 0.00000000000000,
      "moveSpeed": 0.00000000000000,
      "physicalLethality": 0.0,
      "resourceMax": 0.00000000000000,
      "resourceRegenRate": 0.00000000000000,
      "resourceType": "MANA",
      "resourceValue": 0.00000000000000,
      "spellVamp": 0.0,
      "tenacity": 0.0
    },
    "currentGold": 0.00000000000000,
    "fullRunes": {
      "generalRunes": [{
          "displayName": "Electrocute",
          "id": 8112,
          "rawDescription": "perk_tooltip_Electrocute",
          "rawDisplayName": "perk_displayname_Electrocute"
        },
        {
          "displayName": "Cheap Shot",
          "id": 8126,
          "rawDescription": "perk_tooltip_CheapShot",
          "rawDisplayName": "perk_displayname_CheapShot"
        },
        {
          "displayName": "Eyeball Collection",
          "id": 8138,
          "rawDescription": "perk_tooltip_EyeballCollection",
          "rawDisplayName": "perk_displayname_EyeballCollection"
        },
        {
          "displayName": "Relentless Hunter",
          "id": 8105,
          "rawDescription": "perk_tooltip_8105",
          "rawDisplayName": "perk_displayname_8105"
        },
        {
          "displayName": "Celerity",
          "id": 8234,
          "rawDescription": "perk_tooltip_Celerity",
          "rawDisplayName": "perk_displayname_Celerity"
        },
        {
          "displayName": "Gathering Storm",
          "id": 8236,
          "rawDescription": "perk_tooltip_GatheringStorm",
          "rawDisplayName": "perk_displayname_GatheringStorm"
        }
      ],
      "keystone": {
        "displayName": "Electrocute",
        "id": 8112,
        "rawDescription": "perk_tooltip_Electrocute",
        "rawDisplayName": "perk_displayname_Electrocute"
      },
      "primaryRuneTree": {
        "displayName": "Domination",
        "id": 8100,
        "rawDescription": "perkstyle_tooltip_7200",
        "rawDisplayName": "perkstyle_displayname_7200"
      },
      "secondaryRuneTree": {
        "displayName": "Sorcery",
        "id": 8200,
        "rawDescription": "perkstyle_tooltip_7202",
        "rawDisplayName": "perkstyle_displayname_7202"
      },
      "statRunes": [{
          "id": 5008,
          "rawDescription": "perk_tooltip_StatModAdaptive"
        },
        {
          "id": 5003,
          "rawDescription": "perk_tooltip_StatModMagicResist"
        },
        {
          "id": 5003,
          "rawDescription": "perk_tooltip_StatModMagicResist"
        }
      ]
    },
    "level": 1,
    "summonerName": "Riot Tuxedo"
  },
  "allPlayers": [{
    "championName": "Annie",
    "isBot": false,
    "isDead": false,
    "items": [],
    "level": 1,
    "position": "",
    "rawChampionName": "game_character_displayname_Annie",
    "respawnTimer": 0.0,
    "runes": {
      "keystone": {
        "displayName": "Electrocute",
        "id": 8112,
        "rawDescription": "perk_tooltip_Electrocute",
        "rawDisplayName": "perk_displayname_Electrocute"
      },
      "primaryRuneTree": {
        "displayName": "Domination",
        "id": 8100,
        "rawDescription": "perkstyle_tooltip_7200",
        "rawDisplayName": "perkstyle_displayname_7200"
      },
      "secondaryRuneTree": {
        "displayName": "Sorcery",
        "id": 8200,
        "rawDescription": "perkstyle_tooltip_7202",
        "rawDisplayName": "perkstyle_displayname_7202"
      }
    },
    "scores": {
      "assists": 0,
      "creepScore": 0,
      "deaths": 0,
      "kills": 0,
      "wardScore": 0.0
    },
    "skinID": 0,
    "summonerName": "Riot Tuxedo",
    "summonerSpells": {
      "summonerSpellOne": {
        "displayName": "Flash",
        "rawDescription": "GeneratedTip_SummonerSpell_SummonerFlash_Description",
        "rawDisplayName": "GeneratedTip_SummonerSpell_SummonerFlash_DisplayName"
      },
      "summonerSpellTwo": {
        "displayName": "Ignite",
        "rawDescription": "GeneratedTip_SummonerSpell_SummonerDot_Description",
        "rawDisplayName": "GeneratedTip_SummonerSpell_SummonerDot_DisplayName"
      }
    },
    "team": "ORDER"
  }],
  "events": {
    "Events": [{
      "EventID": 0,
      "EventName": "GameStart",
      "EventTime": 0.00000000000000000
    }]
  },
  "gameData": {
    "gameMode": "CLASSIC",
    "gameTime": 0.000000000,
    "mapName": "Map11",
    "mapNumber": 11,
    "mapTerrain": "Default"
  }
}

The Live Client API provides more granular data than the Match API by including timestamps for key events like deaths, turret destructions, and objective kills. However, it still lacks crucial gameplay details such as player positions, ability usage, and combat interactions.

Several public datasets are available, with Kaggle being a prominent source. A popular Kaggle example demonstrates typical data granularity ~ per-game totals (dragon/baron kills, champion kills, wards placed), and per-minute metrics (CS/gold rates). These datasets generally rely on the RIOT Match API, inheriting its limitations.

Expand to show detailed data
gameId blueWins blueWardsPlaced blueWardsDestroyed blueFirstBlood blueKills blueDeaths blueAssists blueEliteMonsters blueDragons blueHeralds blueTowersDestroyed blueTotalGold blueAvgLevel blueTotalExperience blueTotalMinionsKilled blueTotalJungleMinionsKilled blueGoldDiff blueExperienceDiff blueCSPerMin blueGoldPerMin redWardsPlaced redWardsDestroyed redFirstBlood redKills redDeaths redAssists redEliteMonsters redDragons redHeralds redTowersDestroyed redTotalGold redAvgLevel redTotalExperience redTotalMinionsKilled redTotalJungleMinionsKilled redGoldDiff redExperienceDiff redCSPerMin redGoldPerMin
4519157822 0 28 2 1 9 6 11 0 0 0 0 17210 6.6 17039 195 36 643 -8 19.5 1721.0 15 6 0 6 9 8 0 0 0 0 16567 6.8 17047 197 55 -643 8 19.7 1656.7
4523371949 0 12 1 0 5 5 5 0 0 0 0 14712 6.6 16265 174 43 -2908 -1173 17.4 1471.2 12 1 1 5 5 2 2 1 1 1 17620 6.8 17438 240 52 2908 1173 24.0 1762.0
4521474530 0 15 0 0 7 11 4 1 1 0 0 16113 6.4 16221 186 46 -1172 -1033 18.6 1611.3 15 3 1 11 7 14 0 0 0 0 17285 6.8 17254 203 28 1172 1033 20.3 1728.5
4524384067 0 43 1 0 4 5 5 1 0 1 0 15157 7.0 17954 201 55 -1321 -7 20.1 1515.7 15 2 1 5 4 10 0 0 0 0 16478 7.0 17961 235 47 1321 7 23.5 1647.8
4436033771 0 75 4 0 6 6 6 0 0 0 0 16400 7.0 18543 210 57 -1004 230 21.0 1640.0 17 2 1 6 6 7 1 1 0 0 17404 7.0 18313 225 67 1004 -230 22.5 1740.4
4475365709 1 18 0 0 5 3 6 1 1 0 0 15899 7.0 18161 225 42 698 101 22.5 1589.9 36 5 1 3 5 2 0 0 0 0 15201 7.0 18060 221 59 -698 -101 22.1 1520.1
4493010632 1 18 3 1 7 6 7 1 1 0 0 16874 6.8 16967 225 53 2411 1563 22.5 1687.4 57 1 0 6 7 9 0 0 0 0 14463 6.4 15404 164 35 -2411 -1563 16.4 1446.3
4496759358 0 16 2 0 5 13 3 0 0 0 0 15305 6.4 16138 209 48 -2615 -800 20.9 1530.5 15 0 1 13 5 11 1 1 0 0 17920 6.6 16938 157 54 2615 800 15.7 1792.0
4443048030 0 16 3 0 7 7 8 0 0 0 0 16401 7.2 18527 189 61 -1979 -771 18.9 1640.1 15 2 1 7 7 5 2 1 1 0 18380 7.2 19298 240 53 1979 771 24.0 1838.0
4509433346 1 13 1 1 4 5 5 1 1 0 0 15057 6.8 16805 220 39 -1548 -1574 22.0 1505.7 16 2 0 5 4 4 0 0 0 0 16605 6.8 18379 247 43 1548 1574 24.7 1660.5
4452162573 0 20 3 1 4 4 6 0 0 0 0 15474 6.6 16611 231 28 331 -1585 23.1 1547.4 15 2 0 4 4 5 1 1 0 0 15143 7.2 18196 216 51 -331 1585 21.6 1514.3
4453038156 0 33 2 1 11 11 7 1 0 1 0 16695 7.0 18507 157 40 -1505 -635 15.7 1669.5 17 1 0 11 11 9 0 0 0 0 18200 7.0 19142 188 52 1505 635 18.8 1820.0
4515594785 1 18 1 1 7 1 11 1 1 0 0 17865 7.4 19102 238 53 3274 1659 23.8 1786.5 12 1 0 1 7 1 0 0 0 0 14591 6.8 17443 240 50 -3274 -1659 24.0 1459.1
4524924257 0 14 3 0 4 9 1 1 0 1 0 14979 6.6 17213 210 52 -3414 -1141 21.0 1497.9 20 3 1 9 4 11 0 0 0 0 18393 7.2 18354 229 51 3414 1141 22.9 1839.3
4516505202 1 15 3 1 4 4 4 0 0 0 0 15722 6.8 17896 224 51 -470 -187 22.4 1572.2 102 1 0 4 4 3 0 0 0 0 16192 7.0 18083 242 48 470 187 24.2 1619.2
4482120064 0 17 1 0 3 7 3 0 0 0 0 15015 6.8 16974 209 53 -1996 -1804 20.9 1501.5 18 3 1 7 3 13 0 0 0 0 17011 7.2 18778 237 51 1996 1804 23.7 1701.1
4523758462 1 14 1 1 10 2 8 0 0 0 0 19733 7.6 20862 263 56 5228 3378 26.3 1973.3 13 2 0 2 10 2 1 1 0 0 14505 6.8 17484 210 64 -5228 -3378 21.0 1450.5
4503636905 0 43 3 0 3 7 3 1 0 1 0 14852 6.8 16888 203 54 -1975 -1345 20.3 1485.2 17 14 1 7 3 6 0 0 0 0 16827 6.8 18233 218 53 1975 1345 21.8 1682.7
4486384947 1 21 4 1 5 4 11 0 0 0 0 16282 6.8 17378 213 49 882 512 21.3 1628.2 19 3 0 4 5 3 2 1 1 0 15400 6.6 16866 228 52 -882 -512 22.8 1540.0
4457103291 0 11 3 0 5 9 5 0 0 0 0 14994 7.0 17924 188 48 -3155 -2773 18.8 1499.4 15 1 1 9 5 9                          

Some third-party vendors partnered with RIOT offer additional APIs. For instance, the Overwolf API combines features from both RIOT APIs but doesn’t provide significant additional granularity beyond the official sources.

Expand to show detailed data
{
  "info": {
    "live_client_data": {
      "active_player": {
        "abilities": {
          "E": {
            "abilityLevel": 0,
            "displayName": "Stacked Deck",
            "id": "CardmasterStack",
            "rawDescription": "GeneratedTip_Spell_CardmasterStack_Description",
            "rawDisplayName": "GeneratedTip_Spell_CardmasterStack_DisplayName"
          },
          "Passive": {
            "displayName": "Loaded Dice",
            "id": "SecondSight",
            "rawDescription": "GeneratedTip_Passive_SecondSight_Description",
            "rawDisplayName": "GeneratedTip_Passive_SecondSight_DisplayName"
          },
          "Q": {
            "abilityLevel": 0,
            "displayName": "Wild Cards",
            "id": "WildCards",
            "rawDescription": "GeneratedTip_Spell_WildCards_Description",
            "rawDisplayName": "GeneratedTip_Spell_WildCards_DisplayName"
          },
          "R": {
            "abilityLevel": 0,
            "displayName": "Destiny",
            "id": "Destiny",
            "rawDescription": "GeneratedTip_Spell_Destiny_Description",
            "rawDisplayName": "GeneratedTip_Spell_Destiny_DisplayName"
          },
          "W": {
            "abilityLevel": 0,
            "displayName": "Pick a Card",
            "id": "PickACard",
            "rawDescription": "GeneratedTip_Spell_PickACard_Description",
            "rawDisplayName": "GeneratedTip_Spell_PickACard_DisplayName"
          }
        },
        "championStats": {
          "abilityHaste": 0,
          "abilityPower": 0,
          "armor": 21,
          "armorPenetrationFlat": 0,
          "armorPenetrationPercent": 1,
          "attackDamage": 25,
          "attackRange": 0,
          "attackSpeed": 0.6510000228881836,
          "bonusArmorPenetrationPercent": 1,
          "bonusMagicPenetrationPercent": 1,
          "cooldownReduction": 0,
          "critChance": 0,
          "critDamage": 0,
          "currentHealth": 534,
          "healthRegenRate": 0,
          "lifeSteal": 0,
          "magicLethality": 0,
          "magicPenetrationFlat": 0,
          "magicPenetrationPercent": 1,
          "magicResist": 30,
          "maxHealth": 534,
          "moveSpeed": 330,
          "physicalLethality": 0,
          "resourceMax": 333,
          "resourceRegenRate": 0,
          "resourceType": "MANA",
          "resourceValue": 333,
          "spellVamp": 0,
          "tenacity": 0
        },
        "currentGold": 0,
        "fullRunes": {
          "generalRunes": [
            {
              "displayName": "Dark Harvest",
              "id": 8128,
              "rawDescription": "perk_tooltip_DarkHarvest",
              "rawDisplayName": "perk_displayname_DarkHarvest"
            },
            {
              "displayName": "Taste of Blood",
              "id": 8139,
              "rawDescription": "perk_tooltip_TasteOfBlood",
              "rawDisplayName": "perk_displayname_TasteOfBlood"
            },
            {
              "displayName": "Eyeball Collection",
              "id": 8138,
              "rawDescription": "perk_tooltip_EyeballCollection",
              "rawDisplayName": "perk_displayname_EyeballCollection"
            },
            {
              "displayName": "Ravenous Hunter",
              "id": 8135,
              "rawDescription": "perk_tooltip_RavenousHunter",
              "rawDisplayName": "perk_displayname_RavenousHunter"
            },
            {
              "displayName": "Presence of Mind",
              "id": 8009,
              "rawDescription": "perk_tooltip_PresenceOfMind",
              "rawDisplayName": "perk_displayname_PresenceOfMind"
            },
            {
              "displayName": "Coup de Grace",
              "id": 8014,
              "rawDescription": "perk_tooltip_CoupDeGrace",
              "rawDisplayName": "perk_displayname_CoupDeGrace"
            }
          ],
          "keystone": {
            "displayName": "Dark Harvest",
            "id": 8128,
            "rawDescription": "perk_tooltip_DarkHarvest",
            "rawDisplayName": "perk_displayname_DarkHarvest"
          },
          "primaryRuneTree": {
            "displayName": "Domination",
            "id": 8100,
            "rawDescription": "perkstyle_tooltip_7200",
            "rawDisplayName": "perkstyle_displayname_7200"
          },
          "secondaryRuneTree": {
            "displayName": "Precision",
            "id": 8000,
            "rawDescription": "perkstyle_tooltip_7201",
            "rawDisplayName": "perkstyle_displayname_7201"
          },
          "statRunes": [
            { "id": 5008, "rawDescription": "perk_tooltip_StatModAdaptive" },
            { "id": 5008, "rawDescription": "perk_tooltip_StatModAdaptive" },
            { "id": 5003, "rawDescription": "perk_tooltip_StatModMagicResist" }
          ]
        },
        "level": 1,
        "summonerName": "Sh4rgaas"
      }
    }
  },
  "feature": "live_client_data"
}

Using The Datasets

Startups currently use these datasets to provide player analysis and insights. Access to more granular data would enable significantly deeper and more sophisticated analysis tools.

Example of overlays provided in game (Image source: blitz.gg)

Blitz.gg that give timers and pathing for players which is pretty cool!

AI voice suggestions (Image source: backseat.gg)

Some startups give AI suggestions to help with gameplay.

U.GG post game analysis (Image source: u.gg)

Others give deep analysis to one’s gameplay.

At 10000 Feet

Overview

Here’s a high level overview of the system, which I’ll describe each part in the following sections

Overview of the system

Replays And Packets

The process begins by fetching encrypted replay files from a data source. These replays contain compressed gameplay state data that the League of Legends client uses to reconstruct game sessions for viewing. After decryption and decompression, the system processes individual encrypted packets through an emulator, which converts them into structured JSON format. The complete process and replay format are detailed in the rofl format section.

Emulator And Decryption Background

The emulatorEmulator is software that runs an architecture. It’s used here to run specific parts of league of legends binary without running the entire game (useful for debugging and seeing what variables are important). An example of an emulator is unicorn engine inspects and debugs the League of Legends binary to handle encrypted packets. Packet data is decrypted when accessed and immediately re-encrypted afterwardYou also might say: you still don’t need an emulator because you can figure out how decryption works and write your own code to decrypt the entire packet buffer without running an emulator.
League of legends is an actively updated game and thus the decryption changes often, so the emulator makes debugging much easier
.

Here’s a simple Python example demonstrating packet decryption using XOR:

TakeDamagePacket has two fields we're interested in extracting information out of

The TakeDamage packet contains two encrypted fields: the target object and damage amount. For this example, we’ll use a simple XOR operation with 0x69 to decrypt these values.

Expand to show code (in-case above python doesn't show)
# Champion has hp and other fields
class Champion:
  def __init__(self, name, id, hp):
    self.name = name
    self.id = id
    self.hp = hp
    
  def __str__(self):
    return f'{self.name}: id [{self.id}] has {self.hp} hp remaining\n'
    
# Encryption/Decryption using XOR -- not real thing
KEY=0x69
def decrypt(data: bytes) -> bytes:
    return bytes(data[i] ^ KEY for i in range(len(data)))

def encrypt(data: bytes) -> bytes:
    return bytes(data[i] ^ KEY for i in range(len(data)))

class TakeDamagePacket:
  def __init__(self, buffer):
    self.buffer = buffer
  
  def target_id_bytes(self) -> bytes:
    return self.buffer[12:16]
    
  def damage_bytes(self) -> bytes:
    return self.buffer[16:20]

# Helper functions
def convert_bytes_to_int(data: bytes) -> int:
  return int.from_bytes(data, byteorder='big')

name = 'Ezreal'
id = 10
hp = 1000
ezreal = Champion(name, id, hp)
print(ezreal)

# example data from rofl file...
buffer = b'\xeb\x9d\x65\x38\xd3\x0e\x47\xfa\xa1\x29\x13\x9b\x69\x69\x69\x63\x69\x69\x68\x45\x75\xe2\x3e\x9f\x50\xfb\xa8\x26\x1f\xc7\x47\x47\x47'

packet = TakeDamagePacket(buffer)

# First, check if the target is ezreal

# Fetch the offset in packet to target id
encrypted_target_id_bytes = packet.target_id_bytes()
print(f'Encrypted target id bytes[' + ''.join(f'\\x{b:02x}' for b in encrypted_target_id_bytes) + ']')

# Decrypt the bytes and convert to int
target_id_bytes = decrypt(encrypted_target_id_bytes)
print(f'Decrypted target id bytes[' + ''.join(f'\\x{b:02x}' for b in target_id_bytes) + ']')

target_id = convert_bytes_to_int(target_id_bytes)
print(f'\nTarget id [{target_id}] == Ezreal id [{ezreal.id}]?')

if target_id == ezreal.id:
  # After access, remove variable and encrypt bytes
  del target_id
  target_id_bytes = encrypt(target_id_bytes)
  print('Damage target is Ezreal\n')
  print(f'Reencrypt target id bytes[' + ''.join(f'\\x{b:02x}' for b in target_id_bytes) + ']')

  # Ezreal took some damage
  encrypted_damage_bytes = packet.damage_bytes()
  print(f'Encrypted damage bytes[' + ''.join(f'\\x{b:02x}' for b in encrypted_damage_bytes) + ']')
  damage_bytes = decrypt(encrypted_damage_bytes)
  print(f'Decrypted damage bytes[' + ''.join(f'\\x{b:02x}' for b in damage_bytes) + ']')
  damage = convert_bytes_to_int(damage_bytes)

  print(f'\nOh no, id [{id}] (Ezreal) took {damage} damage!')
  ezreal.hp -= damage
  print(ezreal)

  del damage
  damage_bytes = encrypt(damage_bytes)
  print(f'Reencrypt damage bytes[' + ''.join(f'\\x{b:02x}' for b in damage_bytes) + ']')

Each packet field undergoes a decrypt-access-release cycle. In the TakeDamagePacket example, when checking if Ezreal is the target, the id field is decrypted, compared, and immediately re-encrypted and deleted. Once the comparison is complete, the decrypted reference is deleted. The same process applies to the damage field. This pattern (unintentionally) is a good defense against memory scannersMemory scanners such as Cheat Engine allows reverse engineers to find determinstic values in memory and be able to trace them back to the code that modifies or reads the values. By deleting the value, it makes it harder for reverse engineers to find important values and thus, important sections of code that produces that value.

Replays

Data sources

Replays files are available through the official league client or third-party websites such as op.gg or Mobalytics. The replays end in the .rofl Interesting name to choose for the file extension.

How Replays Are Viewed

Players can watch replays through either the League client interface or command line.

Example of how to view a replay through the client

ROFL Replay

While the ROFL replay format is undocumented, community efforts have reverse-engineered its structure. Here’s a simplified version of the format:

The ROFL file format

The key components are the encrypted chunks. Each chunk contains packet metadata (time, type, size) and encrypted packet data. While the timestamps tell us when events occur, we still need to figure out what the packets actually do.

Visually Tracing How Packets Are Used

To understand packet structures, we can analyze what the client must process during replay playback. Let’s examine a specific game frame:

Example of viewing a replay (Image source: League Of Legends)

Frame Analysis (Timestamp 4:02):

There are four champions in the image.

  • Jhin, champion with blue hp on bottom left
  • Braum,
  • Thresh,
  • Ezreal,

Two key actions are occurring:

  • Jhin is taking damage (as indicated by a small red indicator of his health bar).
  • Ezreal and Thresh (red health bars) moving leftward

We should expect at least two packets:

  • Damage Packet
    • Jhin’s id
    • float value for damage taken
  • Movement packet
    • Ezreal or Thresh’s id
    • X, Y, Z coordinates
    • Start and end positions

These packets should align with the timestamp and visible game state. The next section will explore their detailed implementation.

Reverse Engineering

Some Background

League of Legends runs on a custom game engine developed in 2009. At that time, Unreal Engine was the primary commercial option, as Unity hadn’t yet emerged. This means we’re working with a closed-source game engine.

While RIOT’s technical blogs offer insights into their enginePlanning a future game engine for league which discusses the issues with the current game engineGraphics pipeline, they don’t reveal the implementation details we need.

Our analysis focuses on the I/O and networking layers where packet processing occurs. We’re particularly interested in the pipelineIn this context, a pipeline as a process that takes an input, transforms the input and splits out an output that transforms raw packet data into game state:

A stream of packets flows through a game engine to update game state

Packet Processing

Packets are done in 3 steps:

  • Memory Allocation (Packet::Packet)
    • Sets up packet metadata
    • Allocates space for packet data
  • Deserialization (DeserializePacket)
    • Converts chunk data to in-memory representation
    • Extracts some information from the chunk data as packet fields
  • Game State Update (UsePacket)
    • Applies packet data to game state
GameState game_state;

int ParseBufferAndUpdateGameState(char* buffer, uint64_t buffer_size)
{
  // Read the packet type from the buffer
  uint16_t packet_type = *(uint16_t*)buffer;

  // Allocate an empty packet data structure
  Packet packet(packet_type);

  // Parse buffer into packet data structure
  int status_ok = packet.deserialize(buffer, buffer_size);

  // Update game state with packet
  if (status_ok)
    game_state.use_packet(&packet);

  // At this point, the packet is deallocated,
  // invoking the destructor Packet::~Packet()
  return status_ok;
}

Here’s an example of TakeDamagePacket’s DeserializePacket function

enum PacketType : uint16_t
{
  TAKE_DAMAGE = 0x2385,
};

struct TakeDamagePacket : public Packet
{
  PacketType packet_type;
  char* data; // To be parsed later by game state (this contains the source and target id and how much damage was applied)
  int expected_size = 12; // Should expect this left in the buffer
  bool flag1 = false; // Other misc data parsed
}

virtual bool TakeDamagePacket::deserialize(char* buffer, uint64_t buffer_size)
{
  // Skip the packet type from buffer
  char* start_buffer = buffer + sizeof(PacketType);

  // Check if buffer has enough size to read into this packet
  if (buffer_size - sizeof(PacketType) < expected_size)
  {
    return false;
  }

  // Copy the data over
  memcpy(this->data, start_buffer, this->expected_size);

  // See if certain flag is set
  if (start_buffer[3] == 79)
    this->flag1 = true;

  return true;
}
Click to expand to see how we got to packet decompiled code

The first decompiled code snippet below shows the parent function that takes in the raw chunk data, instantiates a specific packet object using a factory class to make a packet object based on its type. At this point, the packet object doesn’t hold any information from the chunk data, so it deserializes the packet data called from its vtable ((*(_QWORD *)packet + 8LL)(packet, &v19, (char *)v6 + v4), which is DeserializePacket). Now, the packet object contains metadata and the chunk data. Then, it uses it by calling from its vtable again ((*(_QWORD *)v11 + 16LL))(v11, packet_, v20))), which calls UsePacket to update the game state.

__int64 __fastcall CallPacketsInOrder(__int64 a1, __int16 **raw_data, __int64 a3, double a4)
{
  __int64 packet; // [rsp+18h] [rbp-38h] BYREF
  __int16 packet_type; // [rsp+24h] [rbp-2Ch] BYREF

  packet_type = **raw_data;
  Packet::Packet(&packet, (unsigned __int16)packet_type);// Packet::Packet(&packet, packet_type);
  if ( !packet )
    return 0;
  v7 = (*(__int64 (__fastcall **)(__int64, __int16 **, char *))(*(_QWORD *)packet + 8LL))(packet, &v19, (char *)v6 + v4);// Calls Packet::Deserialize from packet's vtable
  packet_ = packet;
  *(double *)(a1 + 688) = v20;
  ...
          v11 = *(_QWORD *)(*(_QWORD *)(a1 + 696) + 8 * v10);
          v12 = (*(__int64 (__fastcall **)(__int64, __int64, double))(*(_QWORD *)v11 + 16LL))(v11, packet_, v20);// Calling UsePacket from Packet's VTable
  ...
  if ( packet )
    (*(void (__fastcall **)(__int64))(*(_QWORD *)packet + 32LL))(packet); // Call destructor to packet
  return v7;
}

Decompiled Packet Code

Expand to show decompiled code for allocating memory
int64 *fastcall Packet::Packet(PacketStruct *packet, int packet_type)
{
  switch ( a2 )
  {
    case 2:
      v4 = operator new(0xEuLL);
      *(_WORD *)(v4 + 8) = 2;
      *(_DWORD *)(v4 + 10) = 0;
      v5 = off_101BECE88;
      goto LABEL_639;
    case 3:
      v101 = operator new(0x28uLL);
      *(_WORD *)(v101 + 8) = 3;
      *(_DWORD *)(v101 + 10) = 0;
      *(_QWORD *)v101 = off_101BED530;
      *(_QWORD *)(v101 + 16) = 0LL;
      *(_QWORD *)(v101 + 24) = 0LL;
      *(_WORD *)(v101 + 32) = -22610;
      *packet = v101;
      return packet;
    case 5:
      v4 = operator new(0x20uLL);
      *(_WORD *)(v4 + 8) = 5;
      *(_DWORD *)(v4 + 10) = 0;
      v31 = off_101BEA880;
      goto LABEL_578;
    case 9:
      v4 = operator new(0x20uLL);
      *(_WORD *)(v4 + 8) = 9;
      *(_DWORD *)(v4 + 10) = 0;
      v31 = off_101BE9618;
      goto LABEL_578;
    case 10:
      v4 = operator new(0xEuLL);
      *(_WORD *)(v4 + 8) = 10;
      *(_DWORD *)(v4 + 10) = 0;
      v5 = off_101BECF80;
      goto LABEL_639;
    case 11:
      v381 = operator new(0x1CuLL);
      *(_WORD *)(v381 + 8) = 11;
      *(_DWORD *)(v381 + 10) = 0;
      *(_QWORD *)v381 = off_101BEDDC0;
      *(_QWORD *)(v381 + 16) = 0xE7E7E7E7E7E7E7E7LL;
      *(_DWORD *)(v381 + 24) = -404232217;
      *packet = v381;
      return packet;
    case 12:
      v6 = operator new(0x38uLL);
      *(_WORD *)(v6 + 8) = 12;
      *(_DWORD *)(v6 + 10) = 0;
      *(_QWORD *)v6 = off_101BEEA98;
      *(_QWORD *)(v6 + 16) = off_101BEEA60;
      *(_DWORD *)(v6 + 24) = 320017171;
      *(_QWORD *)(v6 + 32) = 0xBBBBBBBBBBBBBBBBLL;
      *(_QWORD *)(v6 + 40) = 0xCECECECE2E2E2E2ELL;
      *(_DWORD *)(v6 + 48) = -235802127;
      *packet = v6;
      return a1;
Expand to show decompiled code for decrypting packet
char __fastcall DeserializePacket(Packet* packet, unsigned __int64 *a2, unsigned __int64 a3)
{
  v3 = *a2;
  if ( a3 - *a2 < 6 )
    return 0;
  v5 = a2;
  v6 = packet;
  *(_WORD *)(packet + 8) = *(_WORD *)v3;
  *(_DWORD *)(packet + 10) = *(_DWORD *)(v3 + 2);
  v7 = (__int16 *)*a2;
  v8 = (_BYTE *)(*a2 + 6);
  *a2 = (unsigned __int64)v8;
  if ( a3 - (unsigned __int64)v8 < 3 )
    return 0;
  *a2 = (unsigned __int64)v7 + 9;
  v10 = 86;
  if ( (v7[3] & 8) == 0 )
    v10 = 114;
  *(_BYTE *)(packet + 14) = v10;
  switch ( (*((unsigned __int8 *)v7 + 7) >> 4) & 7 )
  {
    case 0:
      *(_DWORD *)(packet + 16) = -1027423577;
      goto LABEL_20;
    case 1:
      if ( (unsigned __int8)sub_100D8D2F0(packet + 16, a2, a3) )
        goto LABEL_20;
      return 0;
    case 2:
      if ( (unsigned __int8)sub_100D8CFB0(packet + 16, a2, a3) )
        goto LABEL_20;
      return 0;
    case 3:
      if ( (unsigned __int8)sub_100D8D490(packet + 16, a2, a3) )
        goto LABEL_20;
      return 0;
    case 4:
      if ( (unsigned __int8)sub_100D8D150(packet + 16, a2, a3) )
        goto LABEL_20;
      return 0;
    case 5:
      *(_DWORD *)(packet + 16) = -1027423550;
      goto LABEL_20;
    case 6:
      *(_DWORD *)(packet + 16) = -1027423610;
      goto LABEL_20;
    case 7:
      *(_DWORD *)(packet + 16) = 1852730990;
Expand to show decompiled code for updating game state
__int64 __fastcall UsePacket(
        __int64 a1,
        __int64 a2,
        __m128 a3,
        __m128 a4,
        double a5,
        double a6,
        double a7,
        double a8,
        double a9)
{
 v9 = *(double *)&a2;
  v10 = 1;
  switch ( *(_WORD *)(a2 + 8) )
  {
    case 0xA:
      sub_100384590(d);
      return v10;
    case 0xE:
      v564 = *(_DWORD *)(a2 + 16);
      v565 = ((187
             - ((65793
               * ((2050
                 * ((unsigned __int8)((65793
                                     * ((2050 * (unsigned __int8)(v564 - 86)) & 0x22110 | (32800
                                                                                         * (unsigned __int8)(v564 - 86)) & 0x88440u)) >> 16) ^ 0xE6)) & 0x22110 | (32800 * ((unsigned __int8)((65793 * ((2050 * (unsigned __int8)(v564 - 86)) & 0x22110 | (32800 * (unsigned __int8)(v564 - 86)) & 0x88440u)) >> 16) ^ 0xE6)) & 0x88440u)) >> 16)) ^ 0xED)
           + 244;
      v566 = (unsigned __int8)((65793
                              * ((2050 * (unsigned __int8)(HIBYTE(v564) - 86)) & 0x22110 | (32800
                                                                                          * (unsigned __int8)(HIBYTE(v564) - 86)) & 0x88440u)) >> 16) ^ 0xE6;
      v567 = c(
               d,
               ((unsigned __int16)(((-17664
                                   - (((65793
                                      * ((2050
                                        * ((unsigned __int8)((65793
                                                            * ((2050 * (unsigned __int8)(BYTE1(v564) - 86)) & 0x22110 | (32800 * (unsigned __int8)(BYTE1(v564) - 86)) & 0x88440u)) >> 16) ^ 0xE6)) & 0x22110 | (32800 * ((unsigned __int8)((65793 * ((2050 * (unsigned __int8)(BYTE1(v564) - 86)) & 0x22110 | (32800 * (unsigned __int8)(BYTE1(v564) - 86)) & 0x88440u)) >> 16) ^ 0xE6)) & 0x88440u)) >> 8) & 0xFF00)) ^ 0xED00)
                                 - 3072)
              + ((-1157627904 - ((16843008 * ((2050 * v566) & 0x22110 | (32800 * v566) & 0x88440)) & 0xFF000000)) ^ 0xED000000 | (((12255232 - ((65793 * ((2050 * ((unsigned __int8)((65793 * ((2050 * (unsigned __int8)(BYTE2(v564) - 86)) & 0x22110 | (32800 * (unsigned __int8)(BYTE2(v564) - 86)) & 0x88440u)) >> 16) ^ 0xE6)) & 0x22110 | (32800 * ((unsigned __int8)((65793 * ((2050 * (unsigned __int8)(BYTE2(v564) - 86)) & 0x22110 | (32800 * (unsigned __int8)(BYTE2(v564) - 86)) & 0x88440u)) >> 16) ^ 0xE6)) & 0x88440)) & 0xFF0000)) ^ 0xED0000) + 15990784) & 0xFF0000)
              - 201326592) | (unsigned __int8)v565);
      if ( v567 )
      {
        v568 = sub_1002F12F0(d, *(unsigned int *)(v567 + 16));
        if ( v568 )
        {
          v569 = v568;
          v570 = (*(__int64 (__fastcall **)(__int64))(*(_QWORD *)v568 + 16LL))(v568);
          v571 = sub_1000EE9E0();
          v = sub_123(v570, v569, v571);
          if ( v )
            (*(void (__fastcall **)(__int64, _QWORD))(*(_QWORD *)v + 2960LL))(
              v,
              (unsigned __int8)((((65793
                                 * ((2050
                                   * (unsigned __int8)((65793
                                                      * ((2050 * *(unsigned __int8 *)(a2 + 14)) & 0x22110 | (32800 * *(unsigned __int8 *)(a2 + 14)) & 0x88440u)) >> 16)) & 0x22110 | (32800 * (unsigned __int8)((65793 * ((2050 * *(unsigned __int8 *)(a2 + 14)) & 0x22110 | (32800 * *(unsigned __int8 *)(a2 + 14)) & 0x88440u)) >> 16)) & 0x88440u)) >> 15) & 0xAA)
                              + (((65793
                                 * ((2050
                                   * (unsigned __int8)((65793
                                                      * ((2050 * *(unsigned __int8 *)(a2 + 14)) & 0x22110 | (32800 * *(unsigned __int8 *)(a2 + 14)) & 0x88440u)) >> 16)) & 0x22110 | (32800 * (unsigned __int8)((65793 * ((2050 * *(unsigned __int8 *)(a2 + 14)) & 0x22110 | (32800 * *(unsigned __int8 *)(a2 + 14)) & 0x88440u)) >> 16)) & 0x88440u)) >> 17) & 0x55)
                              + 51));
        }
      }
      return v10;
    case 0x14:
      v27 = __ROL1__(*(_BYTE *)(a2 + 14), 5);
      v28 = __ROL1__(((v27 >> 1) & 0x55 | (2 * v27) & 0xAA) + 50, 3);
      sub_100384E20(
        d,
        (unsigned __int8)((65793 * ((2050 * v28) & 0x22110 | (32800 * v28) & 0x88440u)) >> 16) != 0);
      return v10;
    case 0x15:
      sub_101153C40();
      if ( *(_DWORD *)(a2 + 24) )
      {
        v930 = *(_QWORD *)(a2 + 16);
        v931 = 60LL * *(unsigned int *)(a2 + 24);
        do
        {
          sub_100170F20(v930);
          v930 += 60LL;
          v931 -= 60LL;
        }
        while ( v931 );
      }
      sub_1001BCB10(0LL);
      return v10;

As you may notice, all three functions is composed of a big switch statement: it’s because it runs each case statement based on the packet type.

We’re interested in how the packet is used to update game state. Let’s take a look into that.

Reading The Packet

After deserialization, we still can’t directly read the packet data. The packet object maintains encrypted chunk data, only decrypting fields when they’re accessed - as explained in the emulator and decryption section. Let’s examine how this works in practice by analyzing the TakeDamage function, focusing on two key pieces of information that we want to extract: who is taking damage and how much damage they’ve taken.

Click to expand to view the raw decompiled TakeDamage function
char __fastcall TakeDamage(__int64 idk, __int64 packet)
{
  v3 = *(_DWORD *)(packet + 16);
  v4 = HIWORD(v3);
  v5 = HIBYTE(v3);
  LOBYTE(v3) = __ROL1__(~byte_1019891E0[(unsigned __int8)~__ROL1__(v3, 4)], 4);
  v6 = byte_1019891E0[(unsigned __int8)(__ROL1__(
                                          ~byte_1019891E0[(unsigned __int8)~__ROL1__(BYTE1(*(_DWORD *)(packet + 16)), 4)],
                                          4)
                                      - 54)];
  v7 = 65793 * ((2050 * v6) & 0x22110 | (32800 * v6) & 0x88440);
  v8 = byte_1019891E0[(unsigned __int8)(__ROL1__(~byte_1019891E0[(unsigned __int8)~__ROL1__(v4, 4)], 4) - 54)];
  v9 = (2050 * v8) & 0x22110 | (32800 * v8) & 0x88440;
  v10 = byte_1019891E0[(unsigned __int8)(__ROL1__(~byte_1019891E0[(unsigned __int8)~__ROL1__(v5, 4)], 4) - 54)];
  target_object = GetInternalObjectFromID(
                    qword_101C60850,
                    (16843008 * ((2050 * v10) & 0x22110 | (32800 * v10) & 0x88440)) & 0xFF000000 | (65793 * v9) & 0xFF0000 | (v7 >> 8) & 0xFF00 | (unsigned __int8)((65793 * ((2050 * byte_1019891E0[(unsigned __int8)(v3 - 54)]) & 0x22110 | (32800 * byte_1019891E0[(unsigned __int8)(v3 - 54)]) & 0x88440u)) >> 16));
  ...
  source_object = GetInternalObjectFromID(
                    qword_101C60850,
                    ((unsigned __int8)(__ROR1__(
                                         byte_1019891E0[byte_1019891E0[(unsigned __int8)(-113
                                                                                       - HIBYTE(*(_DWORD *)(packet + 36)))]],
                                         1)
                                     + 89) << 24) | ((unsigned __int8)(__ROR1__(
                                                                         byte_1019891E0[byte_1019891E0[(unsigned __int8)(-113 - HIWORD(*(_DWORD *)(packet + 36)))]],
                                                                         1)
                                                                     + 89) << 16) | ((unsigned __int8)(__ROR1__(byte_1019891E0[byte_1019891E0[(unsigned __int8)(-113 - BYTE1(*(_DWORD *)(packet + 36)))]], 1) + 89) << 8) | (unsigned int)(unsigned __int8)(__ROR1__(byte_1019891E0[byte_1019891E0[(unsigned __int8)(-113 - *(_DWORD *)(packet + 36))]], 1) + 89));
  ...
  v46 = (HIWORD(*(_DWORD *)(packet + 24)) ^ 0xF2) + 9;
  (*(void (__fastcall **)(__int64, double))(*(_QWORD *)obj + 1920LL))(
    obj,
    *(double *)_mm_cvtsi32_si128(__ROL1__(      // float that represents damage taken
                                   ((unsigned __int8)((*(_DWORD *)(packet + 24) ^ 0xF2) + 9) >> 1) & 0x55 | (2 * ((*(_DWORD *)(packet + 24) ^ 0xF2) + 9)) & 0xAA,
                                   5) ^ 4 | ((__ROL1__(
                                                ((unsigned __int8)((BYTE1(*(_DWORD *)(packet + 24)) ^ 0xF2) + 9) >> 1) & 0x55 | (2 * ((BYTE1(*(_DWORD *)(packet + 24)) ^ 0xF2) + 9)) & 0xAA,
                                                5) ^ 4) << 8) | ((__ROL1__((v46 >> 1) & 0x55 | (2 * v46) & 0xAA, 5) ^ 4) << 16) | ((unsigned __int8)(__ROL1__(((unsigned __int8)((HIBYTE(*(_DWORD *)(packet + 24)) ^ 0xF2) + 9) >> 1) & 0x55 | (2 * ((HIBYTE(*(_DWORD *)(packet + 24)) ^ 0xF2) + 9)) & 0xAA, 5) ^ 4) << 24)).i64);
  ...
  return 1;
}

Tthe second parameter to GetInternalObjectFromID is the id that are the source and target network ids. The functions converts the network ids to the game engine objects by looking through a map of [id -> object pointer]. Then, to apply damage to the target object, there’s a call to obj, using its vtable at offset 0x1920 to apply damage. The conversion using _mm_cvtsi32_si128 just puts the bytes into the xmm0 register because the function takes in a float.

The code can be boiled down to this

void TakeDamage(__int64 idk, char* packet_buffer)
{
  target_object = GetInternalObjectFromID(DECRYPT_VALUE1((uint32_t)packet_buffer + 16));
  source_object = GetInternalObjectFromID(DECRYPT_VALUE2((uint32_t)packet_buffer + 36));

  target_object->take_damage(DECRYPT_FLOAT1((float)packet + 24));
}

GetInternalObjectFromID converts an id (uint32_t) to an internal object in the game. DECRYPT_VALUE1 and DECRYPT_VALUE2 are different inlined functions that decrypts a value that I’ll explain in the next section.

Below is the an emulator for GetInternalObjectFromID for the target object. Feel free to try to it out. In the end, you should expect the value 265086324 in RSI register, which matches the second argument to GetInternalObjectFromID. RDI is the first argument, which is a global map of id -> object and RSI the is the second argument containing the network id.

Address Bytes Assembly
Memory
Address Bytes ASCII
Registers
Reg Value Number

How Does The Decryption Work

Let’s break down this decompiled code: target_object = GetInternalObjectFromID(DECRYPT_VALUE1((uint32_t)packet_buffer + 16));? At first glance it’s hard to read, so let’s examine what’s happening step by step:

  v3 = *(_DWORD *)(packet + 16);
  v4 = HIWORD(v3);
  v5 = HIBYTE(v3);
  LOBYTE(v3) = __ROL1__(~byte_1019891E0[(unsigned __int8)~__ROL1__(v3, 4)], 4);
  v6 = byte_1019891E0[(unsigned __int8)(__ROL1__(
                                          ~byte_1019891E0[(unsigned __int8)~__ROL1__(BYTE1(*(_DWORD *)(packet + 16)), 4)],
                                          4)
                                      - 54)];
  v7 = 65793 * ((2050 * v6) & 0x22110 | (32800 * v6) & 0x88440);
  v8 = byte_1019891E0[(unsigned __int8)(__ROL1__(~byte_1019891E0[(unsigned __int8)~__ROL1__(v4, 4)], 4) - 54)];
  v9 = (2050 * v8) & 0x22110 | (32800 * v8) & 0x88440;
  v10 = byte_1019891E0[(unsigned __int8)(__ROL1__(~byte_1019891E0[(unsigned __int8)~__ROL1__(v5, 4)], 4) - 54)];
  target_object = GetInternalObjectFromID(
                    qword_101C60850,
                    (16843008 * ((2050 * v10) & 0x22110 | (32800 * v10) & 0x88440)) & 0xFF000000 | (65793 * v9) & 0xFF0000 | (v7 >> 8) & 0xFF00 | (unsigned __int8)((65793 * ((2050 * byte_1019891E0[(unsigned __int8)(v3 - 54)]) & 0x22110 | (32800 * byte_1019891E0[(unsigned __int8)(v3 - 54)]) & 0x88440u)) >> 16));

The decryption process uses a 255-byte lookup table combined with arithmetic operations. It’s an enhanced version of XOR obfuscation: instead of a simple XOR, each byte goes through multiple transformations:

  • Initial lookup in the 255-byte table
  • Series of arithmetic operations (multiply, add, subtract)
  • Additional table lookups
  • Final transformation to produce the decrypted value

Boiling this down, it looks like this:

  char lookup_table[255] = { 0x56, 0x82, 0xDC, 0x83, 0x02, 0x8F, 0x29, 0x35, 0x04, 0x21, 0x71, 0x79, 0x9E, 0x92, 0x7F, ... };
  uint32_t enc_data = *(uint32_t*)((char*)packet + 16);
  uint8_t lookup_byte1 = enc_data & 0xFF;
  uint8_t lookup_byte2 = (enc_data >> 8) & 0xFF;
  uint8_t lookup_byte3 = (enc_data >> 16) & 0xFF;

  // Here are the operations for 1 byte
  uint8_t lookup_byte3_intermediate = lookup_table[~lookup_byte3 >>= 4];
  // lookup again
  uint8_t value3 = lookup_table[lookup_byte3_intermediate];
  uint32_t transform_value3 = (16843008 * (2050 * value3) & 0x22110) | (value3 * 32800);
  // Extract the first byte out of it to place in our resulting value (network id)
  uint32_t high_byte = transform_value3 & 0xFF000000

  // Repeat for other bytes...

  uint32_t network_id = 0; // To fill in
  network_id |= high_byte;

Here’s the emulator for taking damage target_object->take_damage(DECRYPT_FLOAT1((float)packet + 24)); where performs the decryption and finally the damage value appears. At the end, you should get 000000003e874000 in RAX which is 0.26416015625

Address Bytes Assembly
Memory
Address Bytes ASCII
Registers
Reg Value Number

So what’s the catch?

While it might seem feasible to reimplement these functions in Python without running the client, several factors make this approach impracticalActually, a more seasoned reverse engineer may know better tools for doing such things.:

  • the encryption and decryption methods changes
    • lookup table might be different
    • the operations after the lookup might be different
    • there’s not a single global lookup table
      • a different lookup table associated with every function
  • the chunk layout (fields) get shuffled per patch
    • What may be in 0x10 offset of the packet may now be at offset 0x38
  • the packet types switch up each patch

“Beating” The Obfuscation

Instead of reverse-engineering each encryption function, I chose a pragmatic approach: reading values directly from registers. This method requires an emulator, which typically runs rather slow (unicorn emulator). Instead, I developed an “exception emulator” that runs natively (detailed in the performance section).

For example, in the TakeDamage function, I intercept two key values:

  • The network id from the RSI register before target_object creation
  • The damage value from RAX before it’s applied

This interception uses a trampoline hookIf you’re not familiar with trampoline hook, it’s essentially a callback that you add or similar to breakpointing in gdb and editing the program’s variables before continuing:

Visually demonstrating what a trampoline hook does (Image source: codereversing.com)
Click to expand to view how the trampoline hook works

The process has two parts:

Save original bytes and insert jump:

movd xmm0,eax
jmp FFF90000 # Insert my trampoline hook here

Create trampoline:

  • Save all register states
  • Call MY_FUNCTION to expose registers as a struct
  • Allow high-level language access to register values
  • Restore registers
  • Execute original instructions
  • Return to normal code flow
push rsp                               
pushfq                                 
...     
push r8                                
push r9                                
push r10                               
push r11                               
push r12                               
push r13                               
push r14                               
push r15                               
push rbx                               
push rcx                               
push rdx                               
push rsi                               
push rdi                               
push rbp                               
sub rsp,40                             
movaps xmmword ptr ss:[rsp],xmm0       
movaps xmmword ptr ss:[rsp+10],xmm1    
movaps xmmword ptr ss:[rsp+20],xmm2    
movaps xmmword ptr ss:[rsp+30],xmm3    
...
mov rax, MY_FUNCTION # This is what we redirect to, to read RAX
call rax                               
...
movaps xmm0,xmmword ptr ss:[rsp]       
movaps xmm1,xmmword ptr ss:[rsp+10]    
movaps xmm2,xmmword ptr ss:[rsp+20]    
movaps xmm3,xmmword ptr ss:[rsp+30]    
add rsp,40                             
pop rbp                                
pop rdi                                
pop rsi                                
pop rdx                                
pop rcx                                
pop rbx                                
pop r15                                
pop r14                                
pop r13                                
pop r12                                
pop r11                                
pop r10                                
pop r9                                 
pop r8                      
...            # Execute original function somewhere here        
jmp qword ptr ds:[FFF90105] 
xchg esp,eax                           
ret far        # Jump back

At the high level, Rust code interfaces with these hooks to read and modify program state:

emulator.hook_address(TakeDamageAddress, move |emulator, context| {
    let damage_float = convert_xmm_to_float(context.xmm0);
    obj.take_damage(damage_float);
});

By applying similar hooks to other game events (ability casts, movement, resource changes, etc.), we can reconstruct the complete gameplay state.

Why do it this way?

  • Works directly with compressed ROFL files
  • Uses fewer resources than running the game
    • Ignores irrelevant game systems
  • Provides precise value extraction

Why not do it this way?

  • Requires hooking undocumented packet structures
    • Have deep understanding of packet structures
  • System breaks with unexpected packet changes
  • Potential for value extraction errors
  • No game engine validation

Alternative Approach: Direct Game State Reading

MiscellaneousStuff’s approach avoids packet analysis entirely by reading game state directly:

Replay extraction by running the game and reading game state per interval (Image source: miscellaneousstuff.github.io)

This method works by running the full game client (sound, graphics, network, etc) in replay mode, sampling the game at fixed time intervals and dumping the game state snapshots to disk.

While simpler to maintain, this sampling approach trades precision for simplicity. Lower sampling intervals increase precision but increases data usage, similar to sample-based profilers.

Performance

Since we’re scraping, perfomance matters a little bit. We don’t want it to be running at a snail’s pace (hours+ per replay).

I’ve written the emulator originally with the unicorn emulatorIf you’re unfamiliar with unicorn emulator, it’s a CPU emulator. I like to think of it as GDB with much more powerful scripting capabilities in python. It was a good for bootstrapping, but it was terribly slow (by how much? somewhere like 5 minutes per replay).

I’ve since rewritten it in rust with an api similar to unicorn emulator’s api. I called it the exception emulator because it runs natively and any exceptions (ex, out of bounds memory accesses) are routed and handled by a hook similar to unicorn’s api. It works by inserting a software breakpoint (INT3 instruction on x86 CPUs) and creating an exception handler that lookups the corresponding the function to call that is mapped to the breakpoint. I use this to reverse functions where the game state is not properly initialized.

My very simple methodology for evaluating my program using a single replay with a machine setup:

  • Input:
    • ~15 minute replay (11.5mb rofl file)
    • “Gaming” machine
    • NVME device
    • Windows 11
    • Other applications are running like chrome and discord
  • Output:
    • prettified json ~(135MB)
PS> Measure-Command {start-process cargo "run --release -- -file test.rofl" -Wait}


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 3
Milliseconds      : 33
Ticks             : 30338426
TotalDays         : 3.51139189814815E-05
TotalHours        : 0.000842734055555556
TotalMinutes      : 0.0505640433333333
TotalSeconds      : 3.0338426
TotalMilliseconds : 3033.8426

Running with one replay of ~15 minutes finishes in 3 seconds. Eyeballing task manager, it takes around 400MB up when running. Output is 135MB of prettified json.

Comparison to MiscellaneousStuff’s approach

Author also does some napkin math ~ 26 / 16 + 0.25 = 1.6 minutes = 96 seconds for 26 minute replay. So accounting for the replay, I’m running at 5 seconds, which is a bit faster than the author’s approach. The memory might bit a bit higher (running the game), which I’ve eyeballed at 1GB. Space utilization is ~820MB unpacking one game from his repo compared to 300MB - 500MB.

While this does show that my approach has better performance, this is definitely eyeballing the results. I haven’t ran the author’s code.

In either case, both should scale decently. Both can run multiple instances of the game/emulator.

Other Tidbits

Packet Optimizations

If you are working at RIOT, I’ve listed some optimizations that could be applied to the packets. I don’t have a full view of how the game engine/server works, so some of these optimizations may not apply.

Repeat Packets

Packets of leaving the fog appear multiple times (20+ repeats sometimes!) at the same time for the same id. I don’t see why this should be done unless there’s an engine specific issue or an issue when generating a replay.

{
  "LeaveFromFog": {
    "time": 862.4249,
    "id": 1073741859
  }
}

A look into a single replay file shows the following:

Repeat packets: 21762
Total packets: 770650
Percentage of repeat packets: 2.82%
Repeat packet total bytes: 65370
Packet total bytes: 3399813
Percentage of bytes used up by repeat packets: 1.92%

A graph showing the breakdown of repeat packets per packet type

Removing these repeats in the replay could reduce the amount of data used per replay by around ~2%, which could save cost for hosting the space and bandwidth saved using the cloud (perhaps AWS).

Doing some quick estimates only for storage writes cost

~30 million active players per day and a player plays at least 2 games per day

average replay size = 10MB = 10 MB
optimization = 0.02
daily active users = 30 million / day = 30M/day
number of games = daily active users × 2 / 10 = 6M/day
data written in a day = number of games × average replay size to GB/day = 60,000 GB/day
data saved = optimization × data written in a day = 1,200 GB/day
aws S3 cost = $0.02 per GB  = $0.02/GB
money saved per day = data saved × aws S3 cost = $24.00/day
money saved per day per month = $730.49
money saved per day per year = $8,765.82

Integer Values

For integer values (ids, hashes, etc), consider using variable integer compression, which is commonly used in serialization frameworks such as protobuf and wasm to save more space.

The chunk format already does this for single byte versus 4 byte integer, the variable integer compression can be used to create 2 byte or 3 byte integers.

Same-ish Packets

Some packets have different ids, but mostly do the same thing For example, there could be a MovementPacket1 and MovementPacket2, and they both have the same fields, except for 1 or 2 extra fields in MovementPacket2. I believe that merging them could reduce complexity, but I do not know how the game server is structured or how the code base is structured

Repeat of Different Packets

Some packets encompass the same meaning. For example, typically, a damage packet is applied (by the same id) followed by a death packet at the same time. The damage packet is not necessary. However, due to the game engine and the way it is structured, it may be difficult to remove this.

Replay Time offset

An optimization I observed for each packet is using a byte (instead of a float) to represent the time offset for the next packet. This is quite neat.

Game Engine Optimizations

I’m going to suggest some game engine optimizations below. However, consider that the game engine itself is quite old, it may be difficult to near impossible to change how the engine is structured.

Single Threadedness

Packets are handled as follows:

Create/Allocate packet -> Deserialize buffer into packet -> Update game state with packet

Instead, multiple threads could be used to split the work

Thread 1: Create/Allocate packet -> Deserialize buffer into packet
Thread 2: Create/Allocate packet -> Deserialize buffer into packet
Thread 3: Create/Allocate packet -> Deserialize buffer into packet
Thread 4: Create/Allocate packet -> Deserialize buffer into packet

Thread 5: Update game state with packet

The packets may be out of order, so have another thread order the packets with queues:

Thread 1: Create/Allocate packet -> Deserialize buffer into packet -> Unordered packet queue
Thread 2: Create/Allocate packet -> Deserialize buffer into packet -> Unordered packet queue
Thread 3: Create/Allocate packet -> Deserialize buffer into packet -> Unordered packet queue
Thread 4: Create/Allocate packet -> Deserialize buffer into packet -> Unordered packet queue

Thread 6: Unordered packet queue -> Order packets -> Ordered packet queue

Thread 5: Ordered Packet queue -> Update game state with packet

Buffer Pools

Instead of allocating/freeing memory for each packet, use a buffer/memory pool for commonly used packets to reduce CPU overhead.

Thoughts

Improvements

One idea is to lift the assembly to LLVM, perform static analysis to generate or extract the decryption mechanism or the functions themselves into a different language such as C++, but that would take way too much time.

Lastly, the bottleneck in the system is most likely replay aggregation (which I haven’t measure yet, but I would assume so given that downloading the replay takes seconds).

Wrapping Up

Fun project. Captured hundred of thousands of replays and did some analysis on them, but got busy. Learned a bit more about reversing and rust.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • League of Legends data scraping the hard and tedious way for fun
  • Uniqueness
  • Paul Graham - When To Do What You Love
  • Review - Managing Memory Tiers with CXL in Virtualized Environments
  • Paul Graham - The Right Kind of Stubborn