Project Overview
League of Legends (LoL) is one of the most popular online multiplayer battle arena games, where two teams of five players compete to destroy the opposing team's base. The game features a ranked system to match players of similar skill levels for fair competition.
A major issue in ranked play is the presence of smurfs — experienced players who create new low-level accounts to play against less experienced opponents. This disrupts matchmaking, creates unfair games, and negatively impacts the experience of legitimate players.
This project aims to detect smurf accounts by analyzing player performance data. The pipeline involves scraping player rank data from op.gg, collecting match statistics via the Riot Games API, performing data cleaning and feature engineering (e.g., KDA ratio, gold per minute), and training a machine learning model to identify outlier accounts likely to be smurfs.
Riot API Integration
Verified Developer Account with Riot Games.
Created and managed personal project API keys.
Integrated Riot API for retrieving game and player data.
AWS Cloud Infrastructure
Set up AWS EC2 Instance (Amazon Linux 2023) tailored to security and project needs.
Configured Security Roles and Elastic IP for stable hosting.
Used the EC2 instance for:
Collaboration and Tools
Version Control: GitHub for repository management.
Collaboration: DeepNote for live Jupyter notebook development.
IDE & Local Development: Heavy usage of PyCharm.
File Transfers: Integrated AWS workflows for moving data.
Docker & Web Technologies
Containerized web applications with Docker.
Implemented routing for worldwide access.
Webserver setup using NGINX with HTML/CSS front-end.
League of Legends Domain Knowledge
Deep dive into the game mechanics:
LiveClient Application
Developed an .exe application to:
Web Scraping & Data Collection
Built multiple web scrapers with BeautifulSoup for:
Feature Engineering
Performed deep feature engineering using Python libraries (Pandas, NumPy, etc.).
Created new heuristic features and indirect attributes derived from multiple raw attributes.
Destructured JSON data into Pandas DataFrames for analysis.
Designed heuristic labels for weakly supervised classification.
Data Preprocessing
Applied imputation methods (mean, mode).
Encoding techniques: Ordinal Encoding and One-Hot Encoding.
Data normalization with StandardScaler.
Dimensionality reduction using PCA.
Data Visualization & Analysis
Machine Learning Models
Focus on Scikit-Learn and ML algorithms:
Anomaly detection modelsModel Evaluation
Used ROC curves, confusion matrices, and classification reports for evaluation.
Compared predictions against heuristic labels to validate performance.
Project Structure & Documentation
CRISP-DM Workflow
ReadMe Overview
Overview
League of Legends (LoL) is a popular multiplayer online battle arena game where two teams of five compete to destroy each other’s base. A recurring problem is smurfing — experienced players creating new low-ranked accounts to play against beginners, causing unfair matches.
This project detects smurfs by analyzing gameplay data. The pipeline includes scraping player ranks from op.gg, collecting match data via the Riot API, cleaning and enriching the data with ranks, engineering features like KDA ratio and gold per minute, and training outlier detection models (Isolation Forest, One-Class SVM, Local Outlier Factor, Neural Networks).
Setup & Preperation
Data Collection
Use the Riot-API to:
from bs4 import BeautifulSoup
import glob
import os
# Path to offline-saved HTML files
path = os.path.join(os.path.dirname(__file__), "NameScrapeTxt")
files = sorted(glob.glob(os.path.join(path, "*.txt")))
players = []
for file in files:
with open(file, "r", encoding="utf-8") as f:
soup = BeautifulSoup(f.read(), "html.parser")
for row in soup.select("tr"):
name = row.select_one("span.whitespace-pre-wrap.text-gray-900")
tag = row.select_one("span.text-gray-500.truncate")
if name and tag:
players.append(f"{name.text.strip()}{tag.text.strip()}")
# Remove duplicates and save results
players = list(set(players))
print(players)
print(f"{len(players)} Spieler gefunden")
with open("alle_spieler.txt", "w", encoding="utf-8") as f:
for p in players:
f.write(p + "\n")
API Fetch Example
from riotwatcher import LolWatcher, RiotWatcher
from dotenv import load_dotenv
import os, json, time
from tqdm import tqdm
# Load API key and region info
load_dotenv()
api_key = os.getenv("RIOT_API_KEY")
platform = os.getenv("PLATFORM")
region = os.getenv("REGION")
lol_watcher = LolWatcher(api_key)
riot_watcher = RiotWatcher(api_key)
def riot_api_request(func, *args, max_retries=3, sleep=2, **kwargs):
"""Generic retry wrapper for Riot API calls"""
for attempt in range(1, max_retries+1):
try:
return func(*args, **kwargs)
except Exception as e:
print(f"[Retry {attempt}/{max_retries}] Error: {e}")
if attempt == max_retries:
raise
time.sleep(sleep)
# Example: Fetch match IDs
account = riot_api_request(
riot_watcher.account.by_riot_id,
platform, "SummonerName", "TAG"
)
match_ids = riot_api_request(
lol_watcher.match.matchlist_by_puuid,
platform, account["puuid"], count=5
)
print(match_ids)
Feature Engineering
This step transforms raw gameplay data into structured machine learning input. It includes several preprocessing steps and the design of domain-specific features to detect smurf behavior effectively.
The following features are some of our derived or selected from the match data
These features are designed to capture mechanical skill, team contribution, and player dominance in matches
Heuristic Feautures
To simulate smurf behavior (since no ground truth exists), we engineered a custom score (`smurf_score`) and binary label (`smurf_flag`) using domain knowledge and rules such as
Features were combined into a weighted score to assign weakly supervised labels
These labels are used as training or evaluation targets for classification and anomaly detection models
Model Training
The training pipeline consists of two parallel approaches: unsupervised anomaly detection and supervised classification, both evaluated using the same weakly supervised label (`smurf_flag`)
Evaluation
All models — both supervised and unsupervised — are evaluated using standard classification metrics. Since ground truth labels are derived heuristically (`smurf_flag`), this step assesses how well the models can replicate or generalize the heuristic logic
t-SNE Vizualization
Requierments.txt
absl-py==2.3.1
anyio==4.10.0
argon2-cffi==25.1.0
argon2-cffi-bindings==25.1.0
arrow==1.3.0
asttokens==3.0.0
astunparse==1.6.3
async-lru==2.0.5
attrs==25.3.0
babel==2.17.0
beautifulsoup4==4.13.4
bleach==6.2.0
bs4==0.0.2
cabarchive==0.2.4
certifi==2025.8.3
cffi==1.17.1
charset-normalizer==3.4.2
colorama==0.4.6
comm==0.2.3
contourpy==1.3.3
cx_Freeze==8.3.0
cx_Logging==3.2.1
cycler==0.12.1
debugpy==1.8.16
decorator==5.2.1
defusedxml==0.7.1
executing==2.2.0
fastjsonschema==2.21.1
filelock==3.18.0
flatbuffers==25.2.10
fonttools==4.59.0
fqdn==1.5.1
gast==0.6.0
google-pasta==0.2.0
grpcio==1.74.0
h11==0.16.0
h5py==3.14.0
httpcore==1.0.9
httpx==0.28.1
idna==3.10
ipykernel==6.30.1
ipython==9.4.0
ipython_pygments_lexers==1.1.1
isoduration==20.11.0
jedi==0.19.2
Jinja2==3.1.6
joblib==1.5.1
json5==0.12.0
jsonpointer==3.0.0
jsonschema==4.25.0
jsonschema-specifications==2025.4.1
jupyter-events==0.12.0
jupyter-lsp==2.2.6
jupyter_client==8.6.3
jupyter_core==5.8.1
jupyter_server==2.16.0
jupyter_server_terminals==0.5.3
jupyterlab==4.4.5
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
keras==3.11.1
kiwisolver==1.4.8
lark==1.2.2
libclang==18.1.1
lief==0.16.5
Markdown==3.8.2
markdown-it-py==3.0.0
MarkupSafe==3.0.2
matplotlib==3.10.5
matplotlib-inline==0.1.7
mdurl==0.1.2
mistune==3.1.3
ml_dtypes==0.5.3
namex==0.1.0
nbclient==0.10.2
nbconvert==7.16.6
nbformat==5.10.4
nest-asyncio==1.6.0
notebook==7.4.5
notebook_shim==0.2.4
numpy==2.3.2
opt_einsum==3.4.0
optree==0.17.0
overrides==7.7.0
packaging==25.0
pandas==2.3.1
pandas-stubs==2.3.0.250703
pandocfilters==1.5.1
parso==0.8.4
pillow==11.3.0
platformdirs==4.3.8
prometheus_client==0.22.1
prompt_toolkit==3.0.51
protobuf==5.29.5
psutil==7.0.0
pure_eval==0.2.3
pycparser==2.22
Pygments==2.19.2
pyparsing==3.2.3
python-dateutil==2.9.0.post0
python-dotenv==1.1.1
python-json-logger==3.3.0
pytz==2025.2
pywin32==311
pywinpty==2.0.15
PyYAML==6.0.2
pyzmq==27.0.1
referencing==0.36.2
requests==2.32.4
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rfc3987-syntax==1.1.0
rich==14.1.0
riotwatcher==3.3.1
rpds-py==0.27.0
scikit-learn==1.7.1
scipy==1.16.1
seaborn==0.13.2
Send2Trash==1.8.3
setuptools==80.4.0
six==1.17.0
sniffio==1.3.1
soupsieve==2.7
stack-data==0.6.3
striprtf==0.0.29
tensorboard==2.20.0
tensorboard-data-server==0.7.2
tensorflow==2.20.0rc0
termcolor==3.1.0
terminado==0.18.1
threadpoolctl==3.6.0
tinycss2==1.4.0
tornado==6.5.1
tqdm==4.67.1
traitlets==5.14.3
ttkbootstrap==1.14.2
types-python-dateutil==2.9.0.20250708
typing_extensions==4.14.1
tzdata==2025.2
uri-template==1.3.0
urllib3==2.5.0
wcwidth==0.2.13
webcolors==24.11.1
webencodings==0.5.1
websocket-client==1.8.0
Werkzeug==3.1.3
wheel==0.45.1
wrapt==1.17.2
xgboost==3.0.3
Docker and Certification
The primary goal was to implement one docker compose File (docker-compose.yml). There was only one issue with the certification. This command initiates a one-time execution of the Certbot client within a temporary Docker container to obtain a TLS/SSL certificate from the Let’s Encrypt Certificate Authority
docker run --rm \
-v /data/compose/9/web:/usr/share/nginx/html \
-v /data/compose/9/letsencrypt:/etc/letsencrypt \
certbot/certbot:latest certonly \
--non-interactive --agree-tos --keep-until-expiring \
--email "*****@****.de" \
--webroot -w /usr/share/nginx/html \
-d eneemr.sabuncuoglu.de
The Docker compose YAML
version: "3.8"
services:
nginx_http:
image: nginx:1.27-alpine
container_name: nginx_http
restart: unless-stopped
ports:
- "80:80" # Serve HTTP traffic on IPv4/IPv6
volumes:
- /data/compose/9/web:/usr/share/nginx/html:ro # Static web content (read-only)
certbot:
image: certbot/certbot:latest
container_name: certbot
restart: unless-stopped
volumes:
- /data/compose/9/web:/usr/share/nginx/html # Webroot for ACME HTTP-01 challenge
- /data/compose/9/letsencrypt:/etc/letsencrypt # Persistent certificate/key storage
entrypoint: ["/bin/sh", "-lc"]
command:
- >
set -e; # Abort on any command error
# Ensure required dirs exist
mkdir -p /usr/share/nginx/html/.well-known/acme-challenge /etc/letsencrypt;
# Initial certificate issuance (only if no cert present)
if [ ! -f /etc/letsencrypt/live/eneemr.sabuncuoglu.de/fullchain.pem ]; then
echo "[certbot] requesting initial certificate for eneemr.sabuncuoglu.de";
certbot certonly \
--non-interactive --agree-tos --keep-until-expiring \
--email "******@*****.de" \
--webroot -w /usr/share/nginx/html \
-d eneemr.sabuncuoglu.de || true; # Ignore failure to avoid container crash
fi;
# Renewal loop: run every 12h
while :; do
# Attempt silent renewal
certbot renew --webroot -w /usr/share/nginx/html --quiet || true;
sleep 12h;
done
certbot_renew:
image: certbot/certbot:latest
container_name: certbot_renew
restart: unless-stopped
volumes:
- /data/compose/9/web:/usr/share/nginx/html
- /data/compose/9/letsencrypt:/etc/letsencrypt
command:
- /bin/sh
- -lc
- |
# Renewal-only loop — no initial issuance logic
while :; do
certbot renew --webroot -w /usr/share/nginx/html --quiet || true
sleep 12h
done
nginx_https_redirect:
image: nginx:1.27-alpine
container_name: nginx_https_redirect
restart: unless-stopped
depends_on:
- certbot # Ensure certbot container has run at least once
ports:
- "443:443" # Serve HTTPS connections
volumes:
- /data/compose/9/letsencrypt:/etc/letsencrypt:ro # Read-only mount of issued certs
entrypoint: ["/bin/sh", "-lc"]
command:
- >
set -e;
# Wait until certificate + key exist before starting Nginx
while [ ! -f /etc/letsencrypt/live/eneemr.sabuncuoglu.de/fullchain.pem ] ||
[ ! -f /etc/letsencrypt/live/eneemr.sabuncuoglu.de/privkey.pem ]; do
echo "[https] waiting for certificates...";
sleep 5;
done;
# Generate minimal nginx.conf: TLS termination with 308 redirect to HTTP
printf '%s\n' '
worker_processes auto;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events { worker_connections 1024; }
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
access_log /var/log/nginx/access.log;
sendfile on;
keepalive_timeout 65;
server_tokens off;
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name eneemr.sabuncuoglu.de;
ssl_certificate /etc/letsencrypt/live/eneemr.sabuncuoglu.de/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/eneemr.sabuncuoglu.de/privkey.pem;
ssl_session_timeout 1d;
ssl_session_cache shared:SSL:10m;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers on;
location / { return 308 http://$$host$$request_uri; } # Preserve method/body in redirect
}
}' > /etc/nginx/nginx.conf;
exec nginx -g 'daemon off;'
GitHub Repository
Ethics & Privacy
PROJECT OWNER
- Enes Sabuncuoglu
- 2nd Semester Master Data Science
- Emre Tahir Tursun
- 1st Semester Master Data Science