PhD Application · School of Economics and Business · University of Ljubljana

Mohammad (Nick) Nikbakht

Young Researcher Position · Digital Mobility & Data Economy · Supervisor: Prof. Aleš Groznik · April 2026

This page accompanies my formal PhD application to the Young Researcher position at the School of Economics and Business, University of Ljubljana. It presents my research background, the intellectual problem I intend to pursue, the methodological pipeline I plan to develop, and a prototype of the expected research output, all prepared under the supervision of Prof. Groznik in the area of digital mobility transformation and platform economics.

Penang, Malaysia

nikbakht@student.usm.my

iNick.tech

github.com/inick-tech

ORCID: 0009-0006-7232-9851

Education

Experience

Publications

Awards

Technical Skills

March 2025 – July 2026

M.Sc. in Mathematics GPA 3.41 / 4.00

Universiti Sains Malaysia (USM) · Penang, Malaysia

Thesis (In Progress): Scalable Optimization Algorithms for High-Dimensional Machine Learning with Applications in NLP-Driven Decision Systems. Developing tighter convergence analyses for adaptive gradient methods (Adam, AdaGrad, AMSGrad) under realistic non-convex conditions, with applications to fine-tuning transformer models for text classification tasks.

September 2020 – February 2024

MBA in Marketing 18.09 / 20.00

Kharazmi University · Tehran, Iran

Thesis: Deconstructing the Global Retail Customer Experience: A Transformer-Based Analysis of Thematic Drivers and Sentiment on Reddit. Applied BERT and RoBERTa models to 50,000+ Reddit discussions; achieved 87% sentiment classification accuracy using fine-tuned language models. Manuscript currently under review at International Journal of Electronic Marketing and Retailing.

September 2015 – September 2019

B.Sc. in Mathematics 17.09 / 20.00

University of Isfahan · Isfahan, Iran

Foundation in pure and applied mathematics including analysis, linear algebra, numerical methods, and probability theory. Served as Teaching Assistant in Data Structures, Mathematics Laboratory (MATLAB & R), and Numerical Analysis.

August 2020 – January 2021

Data Analysis Training Program

Rahnema College · Tehran, Iran

Intensive industry-oriented training in data analysis, machine learning workflows, and real-world problem solving in a mentor-led environment.

Data Analyst

EDGE Company · Muscat, Oman

Apr 2024 – Feb 2025

Built end-to-end data pipelines and predictive models using Python and R to optimize business decision-making in a mobility-adjacent urban market.
Observed firsthand how ride-hailing platforms had restructured commuting patterns, market entry barriers, and transport provider viability across the city.
Translated large-scale analytical findings into actionable business strategies through cross-functional collaboration.

Data Analyst

Danesh Omran Development Company · Tehran, Iran

Mar 2021 – Mar 2022

Transformed raw web data into customer insights by implementing robust data collection and cleaning protocols.
Identified user engagement patterns to align strategic reporting with evolving customer needs.

Data Analyst Intern

Rahnema College · Tehran, Iran

Aug 2020 – Jan 2021

Solved real-world data challenges in a mentor-led, intensive team environment.
Mastered industry-standard data analysis workflows through hands-on execution of technical projects.

Teaching Assistant

Multiple institutions · 2016 – 2023

Statistics for Business and Management (MBA level, Kharazmi University, 2023)
Marketing Strategy (MBA level, Kharazmi University, 2021)
Data Structures · Numerical Analysis · Mathematics Laboratory I & II (Undergraduate, University of Isfahan, 2016 – 2018)

Deconstructing the Global Retail Customer Experience: A Transformer-Based Analysis of Thematic Drivers and Sentiment on Reddit Under Review

Nikbakht, M. (2024). International Journal of Electronic Marketing and Retailing.

A Complete Guide to the R Programming Language: From Foundations to Data Analysis Applications Book

Nikbakht, M. (2022). Isfahan University Press, 320 pages.

Open Source

Global Retail CX Analysis Pipeline

Full Python codebase: BERTopic + RoBERTa sentiment pipeline · github.com/inick-tech/Global-Retail-CX-Analysis

Best Student of the Master's Degree, Universiti Sains Malaysia (2025)
Best Student of the Master's Degree, Kharazmi University of Tehran (2022)
Best Student of the University (last two semesters), University of Isfahan (2019)
Ranked 319 among more than 18,000 participants — Iran National Master's Entrance Exam (2020)
Ranked 1,439 among more than 75,000 participants — Iran National University Entrance Exam (2015)
IELTS Academic 7.0 (December 2023)

Languages

PythonRMATLABSQLStataLaTeX

Machine Learning

PyTorchTensorFlowScikit-learnXGBoostSHAP

NLP & Text Analytics

Hugging FaceBERTopicspaCyNLTKUMAPHDBSCAN

Econometrics

fixestplmAERdidDiD / IVPanel FE

Data & Visualization

Tableauggplot2PandasNumPyPlotly

Human Languages

Persian (native)English (IELTS 7.0)

Background

Who I Am and How I Got Here

My academic trajectory is not a straight line, and I think it is worth being direct about that. I started in pure mathematics, moved into an MBA with a substantive quantitative component, spent nearly a year working as a data analyst in Muscat, and am now finishing an M.Sc. in Mathematics at USM with a thesis on scalable optimization algorithms for high-dimensional machine learning. The common thread across all of that, which I only understood clearly in retrospect, is a sustained interest in what rigorous quantitative methods can tell us about complex systems where human behavior and algorithmic processes interact.

My MSc thesis at USM develops tighter convergence analyses for adaptive gradient methods, specifically Adam and its variants, in high-dimensional non-convex settings. The practical motivation comes from fine-tuning large language models for text classification, where instability during training is a documented problem whose mathematical explanation is less well understood than the engineering workarounds practitioners use. The theoretical framework is complete, and the applied experiments are being extended to additional datasets. Submission is scheduled for July 2026.

My MBA thesis, completed at Kharazmi University, applied BERTopic and a fine-tuned RoBERTa classifier to over 50,000 Reddit discussions about global retail customer experience. The research produced a three-category thematic map of customer discourse, validated statistically, and identified a methodologically interesting null result: the Emotional dimension that prior keyword-based analysis had treated as a discrete category does not form a coherent cluster in transformer-based analysis, which suggests it is a property of all categories rather than a category in itself. A manuscript is currently under review. The full codebase is open-source on GitHub.

Before returning to graduate study at USM, I spent nearly a year at EDGE Company in Muscat building end-to-end predictive pipelines for business decision-making in a city where ride-hailing platforms had, within a short period, fundamentally restructured how people moved around. Working there gave me a practitioner's understanding of what kinds of mobility data exist in the real world, how imperfect and fragmented that data is, and how large the gap is between a descriptive insight and an actionable causal claim. That gap is the methodological problem this dissertation is designed to address.

Research Motivation

Where the Research Problem Comes From

Urban mobility platforms are not neutral intermediaries. They do not simply connect drivers and passengers at a price the market clears. They sit between supply and demand and actively manage both sides simultaneously, allocating drivers algorithmically, nudging users toward particular behaviors through interface design and dynamic pricing, and building proprietary datasets that no external researcher can directly observe. The economic and social consequences of this are significant, uneven across cities and population groups, and still not well understood in the academic literature.

The specific observation that pushed me into this research area came from my time in Muscat. Uber and Careem had changed commuting patterns in that city within a few years. Certain neighborhoods had become meaningfully more accessible than before. Public transit had lost ridership. Some informal transport providers had been pushed out of the market. Most of the platform users I spoke with understood surge pricing in a vague way, but the deeper logic of how the algorithms shaped their choices, waiting times, and longer-run travel behavior was entirely invisible to them. As someone spending his days building predictive models in the same city, I found that invisibility methodologically fascinating. The data to understand what was happening existed inside those platforms. The tools to extract behavioral signal from large, imperfect datasets existed in the NLP and machine learning literature. The econometric methods to draw causal conclusions from that evidence existed in applied economics. What was missing was a research program that brought all three together with sufficient methodological rigor.

The digital mobility ecosystem is also changing faster than the academic literature can track. Mobility-as-a-Service platforms are being deployed unevenly across European cities with striking and poorly explained differences in adoption outcomes. Regulators are attempting to govern algorithmic pricing without a clear conceptual framework for what they are trying to prevent or encourage. Platform operators are building competitive advantages around data accumulation, and the empirical evidence for how strong and durable those advantages are is surprisingly thin. These are live questions, not historical ones, and the academic literature has given practitioners relatively little concrete guidance.

Research Problem

Three Gaps Worth Closing

Urban mobility has been reorganized more fundamentally in the past decade than at any point since the mass adoption of the automobile. Platforms like Uber, Grab, and Bolt do not simply move people from one point to another; they actively structure both sides of the market in real time, while public transit systems are simultaneously being asked to integrate with private platforms in ways that nobody fully designed and that we still do not fully understand. The result is an ecosystem with serious economic and social consequences that is considerably harder to study than traditional transport markets, because the most important data sits inside private platforms and the most interesting questions are causal ones that require more than descriptive analysis.

Gap 01 • Behavioral Dynamics

How Algorithmic Systems Shape Travel Behavior Over Time

Most studies of platform competition in mobility focus on market entry and price effects. Far fewer ask how algorithmic systems, specifically recommendation engines, surge pricing mechanisms, loyalty incentives, and interface nudges, actually change individual travel behavior over time. Users do not simply react to prices; they are shaped by the systems they interact with repeatedly, and the cumulative effects of that shaping on modal choice and mobility demand are not well documented. The methodological challenge is that tracking this rigorously requires longitudinal individual-level data, which is genuinely difficult to obtain outside of platform-internal research programs.

Gap 02 • Data as Competitive Advantage

Data Asymmetry and Market Concentration

It is widely assumed in the platform economics literature that larger mobility platforms benefit from data network effects: more users produce more trip data, which improves demand forecasting, which improves service quality, which attracts more users. The feedback loop is theoretically plausible and the implications for market concentration are serious. The empirical evidence, however, is thinner than the theoretical confidence would suggest, partly because quantifying data advantages requires trip-level granularity that regulators and researchers rarely access, and partly because the causal mechanisms are difficult to isolate from other scale economies.

Gap 03 • MaaS Adoption Dynamics

Conditions for Successful Mobility-as-a-Service Integration

Mobility-as-a-Service platforms, which bundle different transport modes under a single interface and payment system, are being deployed across European cities with striking differences in adoption rates that existing models do not explain well. We have limited understanding of which behavioral and structural conditions predict adoption success, how a new MaaS entrant reshapes competition between incumbent providers, and what regulatory or institutional features enable or obstruct multimodal integration. This is simultaneously a market design question and a behavioral one, and it requires both types of evidence to answer credibly.

Research Questions

Three Organizing Questions

These three gaps translate into three interconnected research questions that will organize the dissertation. They share a common thread: all of them are ultimately about how digital platforms change behavior and market structure in mobility systems, and all of them require a combination of behavioral data analysis and causal econometric inference to answer properly. The questions are deliberately broad at this stage; narrowing them in consultation with Prof. Groznik, after a thorough review of the empirical literature and identification of suitable datasets and natural experiments, is a first-year task.

Research Question 1 • Behavioral

How do algorithmic pricing and recommendation mechanisms on digital mobility platforms influence individual travel behavior and modal choice over time, and through which behavioral channels do those effects operate?

Research Question 2 • Market Structure

What role does data asymmetry play in shaping competitive dynamics between mobility platforms, and how does platform scale affect market concentration and barriers to entry in urban transport markets?

Research Question 3 • MaaS

Under what conditions does Mobility-as-a-Service adoption succeed, and how does integrated platform entry affect the strategic behavior of incumbent transport providers and the revealed travel preferences of users?

Research Pipeline

End-to-End Methodology Prototype

The pipeline below represents the full analytical workflow of the proposed dissertation. It maps data sources through processing and modeling layers to the expected empirical and policy outputs. The design is intentional: machine learning handles pattern recognition and behavioral clustering in high-dimensional data; econometric methods handle causal inference where identification is possible. The two traditions are combined rather than used in isolation, which is where the main methodological contribution lies.

research_pipeline.py | Digital Mobility & Platform Economics | PhD Dissertation

# Layer 1 — Data Sources

Source A

Platform Transaction Data

Uber, Grab, Bolt open data programs; trip-level records with origin, destination, time, fare, mode

APICSV

Source B

GPS Trace Datasets

City open data portals; longitudinal individual-level mobility traces across multiple European cities

GeoJSONPanel

Source C

User-Generated Content

App Store reviews, Trustpilot, Reddit r/uber r/grab; unstructured narrative text at scale

Reddit APINLP

# Layer 2 — Processing

NLP Pipeline

Text Preprocessing & Embedding

Sentence-BERT embeddings, UMAP dimensionality reduction, HDBSCAN clustering, RoBERTa sentiment

Hugging FaceBERTopic

Feature Engineering

Panel Construction

City-quarter panel assembly; individual trip sequences; lag structures; treatment timing variables

Pandasplm

Sequence Modeling

Behavioral Trajectory Encoding

Transformer encoder over trip sequences; clustering modal shift patterns; change-point detection

PyTorchsklearn

# Layer 3 — Analysis (per Research Question)

RQ 1 • Behavioral

Sequence Clustering + DiD

Identify behavioral regime shifts; difference-in-differences on natural experiments (platform entry, pricing shocks)

Transformerdid

RQ 2 • Market Structure

Panel FE + IV Econometrics

Two-way fixed effects; instrumental variables for endogeneity; Callaway-Sant'Anna staggered adoption estimator

fixestAER

RQ 3 • MaaS

XGBoost + SHAP + RP Survey

Adoption driver profiling across city types; SHAP feature attribution; revealed and stated preference fusion

XGBoostSHAP

# Layer 4 — Output

Expected Deliverables

Dissertation Outputs

Three peer-reviewed articles addressing each research question; demand prediction models for practitioners; policy evidence on platform data regulation; open-source pipeline code for replication

Peer-Reviewed Articles (SSCI journals)

4yr

Doctoral Timeline

ML+EC

Combined Methodology (Machine Learning + Econometrics)

pipeline: ready

python 3.11 · R 4.3 · pytorch 2.2 · fixest 0.11

applicant: nikbakht@student.usm.my · target: Ljubljana 2026

NLP Behavioral Signal Extraction

The NLP pipeline below is a direct extension of the methodology developed in my MBA thesis, adapted for mobility platform review data. The core logic is identical: convert text into semantic embeddings, identify thematic clusters inductively, then classify sentiment per cluster. The adaptation for mobility contexts involves targeting behavioral signals, such as reactions to wait time increases, pricing changes, or interface redesigns, rather than the retail experience dimensions analyzed in the original research.

nlp_mobility_pipeline.py Python

# Mobility platform review analysis pipeline
# Adapted from: Global Retail CX Analysis (Nikbakht, 2024)
# github.com/inick-tech/Global-Retail-CX-Analysis

from bertopic import BERTopic
from sentence_transformers import SentenceTransformer
from transformers import pipeline
from umap import UMAP
from hdbscan import HDBSCAN
import pandas as pd

# Stage 1: Semantic embedding
embed_model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings  = embed_model.encode(review_corpus, show_progress_bar=True)

# Stage 2: Inductive topic discovery (no predefined categories)
topic_model = BERTopic(
    umap_model    = UMAP(n_neighbors=15, n_components=5, metric="cosine"),
    hdbscan_model = HDBSCAN(min_cluster_size=50, prediction_data=True),
    embedding_model= embed_model,
    verbose       = True
)
topics, probs = topic_model.fit_transform(review_corpus, embeddings)

# Stage 3: Sentiment per behavioral theme
clf = pipeline(
    "sentiment-analysis",
    model  = "cardiffnlp/twitter-roberta-base-sentiment-latest",
    device = 0
)
df["sentiment"] = df["text"].apply(lambda x: clf(x)[0]["label"])
df["topic"]     = topics

# Stage 4: Aggregate sentiment by topic and time window
# (tracks how reactions to platform changes evolve over months)
sentiment_ts = (
    df
    .groupby(["topic", pd.Grouper(key="date", freq="M")])["sentiment"]
    .value_counts(normalize=True)
    .unstack()
    .fillna(0)
)

Panel Econometrics for Causal Inference

The R code below implements the causal inference component for the market structure research question. The Callaway-Sant'Anna estimator is used because it is robust to treatment effect heterogeneity across cities and cohorts, which is the realistic scenario when different urban markets adopted digital mobility platforms at different times under different regulatory conditions. This is the part of the methodology I am still deepening, and it represents the primary learning objective for the first year of the doctoral program.

causal_panel_analysis.R R

# Panel econometrics: platform market structure analysis
# Staggered DiD + Two-Way Fixed Effects

library(fixest)    # fast TWFE with clustered SE (Berge 2018)
library(did)       # Callaway-Sant'Anna robust to heterogeneity

# Two-way FE: absorb city and time unobservables
model_twfe <- feols(
    log_market_share ~ platform_entry * post
                     + log_population + transit_density
                     | city_id + quarter,
    data    = mobility_panel,
    cluster = ~city_id
)
etable(model_twfe, tex=TRUE)

# Staggered adoption: heterogeneity-robust ATT
# Handles cities adopting platforms at different times
cs_att <- att_gt(
    yname    = "log_market_share",
    tname    = "quarter",
    idname   = "city_id",
    gname    = "first_treat_quarter",
    xformla  = ~population + income_pc + transit_freq,
    data     = mobility_panel,
    est_method = "dr"   # doubly robust
)

# Event-study aggregation: plot pre/post treatment dynamics
es <- aggte(cs_att, type = "dynamic")
ggdid(es, title = "Effect of Platform Entry on Market Share")

MaaS Adoption Modeling with Interpretable ML

maas_adoption_shap.py Python

# MaaS adoption prediction + SHAP interpretability

from xgboost import XGBClassifier
from sklearn.model_selection import StratifiedKFold
import shap

feature_groups = {
    "behavioral" : ["modal_diversity_idx", "weekly_trips_avg", "pt_share_pre"],
    "structural" : ["city_density", "transit_frequency", "gdp_per_capita"],
    "regulatory" : ["open_data_mandate", "data_sharing_score", "subsidy_level"],
    "platform"   : ["n_competing_platforms", "avg_wait_time", "surge_freq"]
}

clf = XGBClassifier(
    n_estimators    = 500,
    max_depth       = 6,
    learning_rate   = 0.05,
    subsample       = 0.8,
    colsample_bytree= 0.8,
    eval_metric     = "auc",
    random_state    = 42
)

# Cross-validated training
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
clf.fit(X_train, y_train)

# SHAP: which city-level features drive adoption?
explainer   = shap.TreeExplainer(clf)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test, plot_type="beeswarm")
shap.waterfall_plot(shap.Explanation(shap_values[0], base_values=explainer.expected_value))

Expected Contributions

What This Research Aims to Produce

The dissertation is expected to produce a minimum of three published or near-published articles by the end of the funding period, each addressing one of the three research questions. Beyond publications, there are applied and methodological contributions that are worth being explicit about, because they represent goals that go beyond the academic output requirements of the program.

On the behavioral question: a documented empirical account of how algorithmic platform interventions, including pricing changes and interface redesigns, shift individual travel behavior over time. This does not currently exist in the transport economics literature in a form that draws on longitudinal individual-level data combined with credible causal identification.
On market structure: empirical evidence about the actual magnitude of data network effects in mobility markets, and what scale of data advantage translates into measurable competitive barriers. This is a question regulators across Europe are actively asking without much concrete quantitative work to rely on.
On MaaS adoption: a comparative analysis across city deployments that identifies which behavioral and structural conditions systematically predict adoption success, and how incumbents respond strategically to integrated platform entry. The policy relevance is direct given where European cities currently are in MaaS implementation cycles.
Methodologically: a demonstrated and replicable workflow for combining NLP analysis of user-generated content, behavioral machine learning on longitudinal trip sequences, and panel econometrics for causal inference in a transport and platform economics context. The combination is less common than it should be, and executing it properly would be useful beyond the specific questions of this dissertation.

Fit and Rationale

Why This Position and Why Ljubljana

The honest answer to why I am applying specifically here is that the fit is unusually direct. Prof. Groznik's research group works at the intersection of digital transformation of mobility ecosystems, with attention to how digitalization affects market structure, user behavior, and the strategic choices of platform operators. That is not a description I am retrofitting to my interests after reading the call for applications; those are the questions I have been thinking about for the past two years, since my time in Muscat gave me a practitioner's view of what platform-driven mobility transformation actually looks like in an urban market.

What I bring to this position is a combination of things that do not typically travel together. Mathematical training at the undergraduate and graduate levels, including ongoing work on the theoretical foundations of optimization algorithms used in large-scale machine learning. Applied NLP experience, including a peer-reviewed manuscript under review and open-source code implementing the full research pipeline. Industry experience building predictive models in a mobility-adjacent setting, which gave me a practitioner's understanding of what kinds of data exist, how messy it is in practice, and what the gap between analytical insight and actionable business decision actually looks like. And a specific substantive interest in how digital platforms reshape mobility markets that has been consistent across the last several years of my work, independent of the particular degree program I happened to be in.

What I am still building is depth in applied econometrics, particularly the causal inference methods needed for credible analysis of panel data in market structure research. This is a gap I cannot fill independently, and I want to be direct about that. The methodological environment at Ljubljana, and specifically the combination of economic theory and quantitative empirical rigor that characterizes Prof. Groznik's group, is the environment in which I can develop that depth properly. A doctoral program is not just a research opportunity; it is a professional formation, and I am applying to this one because it offers the specific formation I need, with the supervisor whose research is closest to the questions I care about.

The timing also matters. I am finishing my MSc at USM in July 2026. The optimization theory work has been more mathematically demanding than I anticipated, and working through the convergence analysis of adaptive gradient methods in high-dimensional non-convex settings has left me with a clearer sense of what rigorous quantitative research actually requires. I feel more ready for doctoral work now than I would have two years ago, and more certain about the direction I want it to take.

Key References

Selected Bibliography

Cramer, J., & Krueger, A. B. (2016). Disruptive change in the taxi business: The case of Uber. American Economic Review, 106(5), 177–182.
Hensher, D. A. (2017). Future bus transport contracts under a Mobility as a Service (MaaS) regime in the digital age. Transportation Research Part A, 98, 86–96.
Parker, G., Van Alstyne, M., & Choudary, S. P. (2016). Platform revolution. W. W. Norton.
Rochet, J.-C., & Tirole, J. (2003). Platform competition in two-sided markets. Journal of the European Economic Association, 1(4), 990–1029.
Tirachini, A., & Gomez-Lobo, A. (2020). Does ride-hailing increase or decrease vehicle kilometers traveled? International Journal of Sustainable Transportation, 14(3), 187–204.
Callaway, B., & Sant'Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200–230.
Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv:2203.05794.