ScholarshipID — Two-Tower Recommendation Model

title	ScholarshipID
emoji	🎓
colorFrom	blue
colorTo	green
sdk	docker
pinned	false

ScholarshipID — Two-Tower Recommendation Model

Sistem rekomendasi beasiswa menggunakan arsitektur Two-Tower (Dual Encoder) untuk mencocokkan profil siswa SMA dengan beasiswa S1 luar negeri, menghasilkan top-5 beasiswa paling relevan per siswa.

Arsitektur

Student Tower                     Scholarship Tower
  Input(506)                         Input(509)
  Dense(256, relu)                   Dense(256, relu)
  Dense(128, relu)                   Dense(128, relu)
  L2Normalize                        L2Normalize
      │                                   │
      └──────── Dot Product ──────────────┘
                     │
              Top-5 Ranking

Student Tower: concat(structured_features=122, text_emb=384) → 128-dim L2-normalized embedding
Scholarship Tower: concat(structured_features=125, text_emb=384) → 128-dim L2-normalized embedding
Text Encoder: Sentence-BERT all-MiniLM-L6-v2 (384-dim, frozen, pre-computed)
Retrieval: Brute-force dot product vs semua 44 scholarship
Loss: Sampled softmax + in-batch negatives, temperature=0.1, sample weighting (accepted=5×, apply=2×, click=1×)
Metrics: Recall@5, NDCG@5, MRR

Struktur Folder

├── configs/
│   └── default.yaml             # All config: hyperparameters, model checkpoints, server settings
├── data/
│   ├── raw/                     # students.csv, scholarships.csv, feedback.csv
│   ├── processed/
│   └── features/
│       └── text_embeddings/     # Cache SBERT embeddings (.npy)
├── notebooks/
│   └── notebook_two_tower.ipynb # Referensi implementasi (TF/Keras)
├── outputs/
│   ├── checkpoints/             # student_tower_best.keras, scholarship_tower_best.keras
│   ├── embeddings/              # scholarship_emb.npy, scholarship_ids.npy
│   └── logs/                    # TensorBoard logs (tb_{experiment_name}/)
├── scripts/
│   ├── hf_sync.py               # HuggingFace artifact sync (pull/push)
│   ├── dataset_generator.py  # Step 0: generate dataset
│   ├── precompute_text_embeddings.py  # Step 1: cache SBERT
│   ├── train.py                        # Step 2: training
│   ├── evaluate.py                    # Step 3: evaluasi test set
│   ├── export_embeddings.py           # Step 4: export untuk serving
│   └── serve.py                       # Start FastAPI inference server
└── src/
    ├── generator/
    │   ├── data_seeds.py              # Hardcoded student and scholarship value    
    │   ├── schemas.py                 # Data schemas
    │   └── convert_csv.py             # Tool for save dataframe as csv
    ├── models/
    │   ├── student_tower.py
    │   ├── scholarship_tower.py
    │   └── two_tower.py
    ├── serving/
    │   ├── inference_engine.py        # Inference engine (encode, retrieve)
    │   └── api.py                     # FastAPI endpoints
    ├── trainers/trainer.py
    ├── evaluators/evaluator.py
    ├── utils/
    │   ├── feature_engineering.py
    │   └── data_loader.py

Setup

Windows: pastikan Microsoft Visual C++ Redistributable 2019 sudah terinstall.

# Pastikan python di sini adalah Python sistem (bukan conda base). Minimal versi 3.11
python -m venv venv # or uv venv venv -p 3.11

# Windows
.\venv\Scripts\Activate.ps1
# Mac/Linux
source venv/bin/activate

pip install -r requirements.txt # or use yusr-requirements.txt for CPU only compute
pip install -e .

# If you using yusr-requirements.txt to install the packages, install this too
pip install torch==2.2.2+cpu --index-url https://download.pytorch.org/whl/cpu

# Optionally, if Tensorboard failing to launch
pip install 'setuptools<75'

HuggingFace Setup (Artifact Sync)

To sync model artifacts and data with HuggingFace:

# 1. Copy example env file and fill in your token
cp .env.example .env
# Edit .env with your HF_TOKEN from https://huggingface.co/settings/tokens

Quick Start

# Step 0 — Generate dataset sintetis students.csv, scholarships.csv, feedback.csv
python scripts/dataset_generator.py # or python -m scripts.dataset_generator

# Step 1 — Pre-compute text embeddings (sekali saja, ~5-10 menit)
python scripts/precompute_text_embeddings.py # or python -m scripts.precompute_text_embeddings

# Step 2 — Train model
python scripts/train.py --config configs/default.yaml # or python -m scripts.train --config configs/default.yaml

# Step 3 — Evaluasi pada test set (checkpoint paths default to configs/default.yaml)
python scripts/evaluate.py \  # or python -m scripts.evaluate \
  --config configs/default.yaml

# Step 4 — Export scholarship embeddings untuk serving (checkpoint path defaults to config)
python scripts/export_embeddings.py \  # or python -m scripts.export_embeddings \
  --config configs/default.yaml

HuggingFace Artifact Sync

Two separate repos are used for syncing:

Repo	Contents	Type
`ydmhmhm/scholarshipid-data`	`data/raw/`, `outputs/logs/`	Dataset
`ydmhmhm/scholarshipid-model`	`checkpoints/`, `embeddings/`	Model

CLI commands:

# Pull data + model from HuggingFace (before starting serving)
python scripts/hf_sync.py pull-data --config configs/default.yaml
python scripts/hf_sync.py pull-model --config configs/default.yaml

# Push data + model to HuggingFace (after retraining/refreshing)
python scripts/hf_sync.py push-data --config configs/default.yaml --message "New data"
python scripts/hf_sync.py push-model --config configs/default.yaml --message "Retrained"

Auto-integration:

scripts/serve.py — pulls both repos before FastAPI starts
scripts/retrain.py — pushes both repos after retraining
src/serving/api.py /retrain endpoint — pushes both repos (data + model) on API retrain
src/serving/api.py /refresh endpoint — pushes data only after refreshing scholarship cache

Docker Deployment (HuggingFace Spaces)

The project includes a Dockerfile for deploying on HuggingFace Spaces with Docker runtime.

# Build locally
docker build -t scholarshipid-model .

# Run locally (sets SERVER_PORT=7860 to match HF Spaces)
docker run -p 7860:7860 \
  --name scholarship-id
  -e HF_TOKEN=your_token_here \
  -e SERVER_PORT=7860 \
  scholarshipid-model

How it works:

Container starts → serve.py runs automatically
Pulls models/data from HuggingFace repos (configured in .env)
Starts FastAPI on port 7860

Deploy to HF Spaces:

Push your repo to GitHub
Create a new Space → select Docker runtime
Connect your GitHub repo and give it a name like scholarshipid-api
Set HF_TOKEN as a secret in the Space settings (Settings → Secrets and variables → Actions)
The API will be live at https://your-space-name.hf.space

Data

File	Rows	Keterangan
`students.csv`	20.000	Profil siswa SMA
`scholarships.csv`	43	Beasiswa S1 luar negeri
`feedback.csv`	100.000	Interaksi: click / apply / accepted

Monitoring (TensorBoard)

TensorBoard logs are written to outputs/logs/tb_{experiment_name}/.

tensorboard --logdir outputs/logs/ --bind_all

Serving (FastAPI)

After training, start the inference server:

# Start the serving server
python scripts/serve.py # or python -m scripts.serve

Server runs on http://localhost:<PORT_DEFINED_AT_CONFIG> with the following endpoints:

GET `/docs` — Swagger docs

Configuration

All configuration is in configs/default.yaml:

Model checkpoints: models.student_tower, models.scholarship_tower
Server settings: server.host, server.port, server.cors_origins
Auth: server.auth_required, server.auth_token (set for production)
Retraining: retraining.holdout_fraction (0.0 = use all data)

Performance (test set)

Metric	Score
Recall@5	~0.32
NDCG@5	~0.22
MRR	~0.21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScholarshipID — Two-Tower Recommendation Model

Arsitektur

Struktur Folder

Setup

HuggingFace Setup (Artifact Sync)

Quick Start

HuggingFace Artifact Sync

Docker Deployment (HuggingFace Spaces)

Data

Monitoring (TensorBoard)

Serving (FastAPI)

GET `/docs` — Swagger docs

Configuration

Performance (test set)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.github/workflows		.github/workflows
configs		configs
data		data
notebooks		notebooks
outputs		outputs
scripts		scripts
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
yusr-requirements.txt		yusr-requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ScholarshipID — Two-Tower Recommendation Model

Arsitektur

Struktur Folder

Setup

HuggingFace Setup (Artifact Sync)

Quick Start

HuggingFace Artifact Sync

Docker Deployment (HuggingFace Spaces)

Data

Monitoring (TensorBoard)

Serving (FastAPI)

GET /docs — Swagger docs

Configuration

Performance (test set)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

GET `/docs` — Swagger docs

Packages