This page accompanies my formal PhD application to the Young Researcher position at the School of Economics and Business, University of Ljubljana. It presents my research background, the intellectual problem I intend to pursue, the methodological pipeline I plan to develop, and a prototype of the expected research output, all prepared under the supervision of Prof. Groznik in the area of digital mobility transformation and platform economics.
My academic trajectory is not a straight line, and I think it is worth being direct about that. I started in pure mathematics, moved into an MBA with a substantive quantitative component, spent nearly a year working as a data analyst in Muscat, and am now finishing an M.Sc. in Mathematics at USM with a thesis on scalable optimization algorithms for high-dimensional machine learning. The common thread across all of that, which I only understood clearly in retrospect, is a sustained interest in what rigorous quantitative methods can tell us about complex systems where human behavior and algorithmic processes interact.
My MSc thesis at USM develops tighter convergence analyses for adaptive gradient methods, specifically Adam and its variants, in high-dimensional non-convex settings. The practical motivation comes from fine-tuning large language models for text classification, where instability during training is a documented problem whose mathematical explanation is less well understood than the engineering workarounds practitioners use. The theoretical framework is complete, and the applied experiments are being extended to additional datasets. Submission is scheduled for July 2026.
My MBA thesis, completed at Kharazmi University, applied BERTopic and a fine-tuned RoBERTa classifier to over 50,000 Reddit discussions about global retail customer experience. The research produced a three-category thematic map of customer discourse, validated statistically, and identified a methodologically interesting null result: the Emotional dimension that prior keyword-based analysis had treated as a discrete category does not form a coherent cluster in transformer-based analysis, which suggests it is a property of all categories rather than a category in itself. A manuscript is currently under review. The full codebase is open-source on GitHub.
Before returning to graduate study at USM, I spent nearly a year at EDGE Company in Muscat building end-to-end predictive pipelines for business decision-making in a city where ride-hailing platforms had, within a short period, fundamentally restructured how people moved around. Working there gave me a practitioner's understanding of what kinds of mobility data exist in the real world, how imperfect and fragmented that data is, and how large the gap is between a descriptive insight and an actionable causal claim. That gap is the methodological problem this dissertation is designed to address.
Urban mobility platforms are not neutral intermediaries. They do not simply connect drivers and passengers at a price the market clears. They sit between supply and demand and actively manage both sides simultaneously, allocating drivers algorithmically, nudging users toward particular behaviors through interface design and dynamic pricing, and building proprietary datasets that no external researcher can directly observe. The economic and social consequences of this are significant, uneven across cities and population groups, and still not well understood in the academic literature.
The specific observation that pushed me into this research area came from my time in Muscat. Uber and Careem had changed commuting patterns in that city within a few years. Certain neighborhoods had become meaningfully more accessible than before. Public transit had lost ridership. Some informal transport providers had been pushed out of the market. Most of the platform users I spoke with understood surge pricing in a vague way, but the deeper logic of how the algorithms shaped their choices, waiting times, and longer-run travel behavior was entirely invisible to them. As someone spending his days building predictive models in the same city, I found that invisibility methodologically fascinating. The data to understand what was happening existed inside those platforms. The tools to extract behavioral signal from large, imperfect datasets existed in the NLP and machine learning literature. The econometric methods to draw causal conclusions from that evidence existed in applied economics. What was missing was a research program that brought all three together with sufficient methodological rigor.
The digital mobility ecosystem is also changing faster than the academic literature can track. Mobility-as-a-Service platforms are being deployed unevenly across European cities with striking and poorly explained differences in adoption outcomes. Regulators are attempting to govern algorithmic pricing without a clear conceptual framework for what they are trying to prevent or encourage. Platform operators are building competitive advantages around data accumulation, and the empirical evidence for how strong and durable those advantages are is surprisingly thin. These are live questions, not historical ones, and the academic literature has given practitioners relatively little concrete guidance.
Urban mobility has been reorganized more fundamentally in the past decade than at any point since the mass adoption of the automobile. Platforms like Uber, Grab, and Bolt do not simply move people from one point to another; they actively structure both sides of the market in real time, while public transit systems are simultaneously being asked to integrate with private platforms in ways that nobody fully designed and that we still do not fully understand. The result is an ecosystem with serious economic and social consequences that is considerably harder to study than traditional transport markets, because the most important data sits inside private platforms and the most interesting questions are causal ones that require more than descriptive analysis.
Most studies of platform competition in mobility focus on market entry and price effects. Far fewer ask how algorithmic systems, specifically recommendation engines, surge pricing mechanisms, loyalty incentives, and interface nudges, actually change individual travel behavior over time. Users do not simply react to prices; they are shaped by the systems they interact with repeatedly, and the cumulative effects of that shaping on modal choice and mobility demand are not well documented. The methodological challenge is that tracking this rigorously requires longitudinal individual-level data, which is genuinely difficult to obtain outside of platform-internal research programs.
It is widely assumed in the platform economics literature that larger mobility platforms benefit from data network effects: more users produce more trip data, which improves demand forecasting, which improves service quality, which attracts more users. The feedback loop is theoretically plausible and the implications for market concentration are serious. The empirical evidence, however, is thinner than the theoretical confidence would suggest, partly because quantifying data advantages requires trip-level granularity that regulators and researchers rarely access, and partly because the causal mechanisms are difficult to isolate from other scale economies.
Mobility-as-a-Service platforms, which bundle different transport modes under a single interface and payment system, are being deployed across European cities with striking differences in adoption rates that existing models do not explain well. We have limited understanding of which behavioral and structural conditions predict adoption success, how a new MaaS entrant reshapes competition between incumbent providers, and what regulatory or institutional features enable or obstruct multimodal integration. This is simultaneously a market design question and a behavioral one, and it requires both types of evidence to answer credibly.
These three gaps translate into three interconnected research questions that will organize the dissertation. They share a common thread: all of them are ultimately about how digital platforms change behavior and market structure in mobility systems, and all of them require a combination of behavioral data analysis and causal econometric inference to answer properly. The questions are deliberately broad at this stage; narrowing them in consultation with Prof. Groznik, after a thorough review of the empirical literature and identification of suitable datasets and natural experiments, is a first-year task.
How do algorithmic pricing and recommendation mechanisms on digital mobility platforms influence individual travel behavior and modal choice over time, and through which behavioral channels do those effects operate?
What role does data asymmetry play in shaping competitive dynamics between mobility platforms, and how does platform scale affect market concentration and barriers to entry in urban transport markets?
Under what conditions does Mobility-as-a-Service adoption succeed, and how does integrated platform entry affect the strategic behavior of incumbent transport providers and the revealed travel preferences of users?
The pipeline below represents the full analytical workflow of the proposed dissertation. It maps data sources through processing and modeling layers to the expected empirical and policy outputs. The design is intentional: machine learning handles pattern recognition and behavioral clustering in high-dimensional data; econometric methods handle causal inference where identification is possible. The two traditions are combined rather than used in isolation, which is where the main methodological contribution lies.
The NLP pipeline below is a direct extension of the methodology developed in my MBA thesis, adapted for mobility platform review data. The core logic is identical: convert text into semantic embeddings, identify thematic clusters inductively, then classify sentiment per cluster. The adaptation for mobility contexts involves targeting behavioral signals, such as reactions to wait time increases, pricing changes, or interface redesigns, rather than the retail experience dimensions analyzed in the original research.
# Mobility platform review analysis pipeline
# Adapted from: Global Retail CX Analysis (Nikbakht, 2024)
# github.com/inick-tech/Global-Retail-CX-Analysis
from bertopic import BERTopic
from sentence_transformers import SentenceTransformer
from transformers import pipeline
from umap import UMAP
from hdbscan import HDBSCAN
import pandas as pd
# Stage 1: Semantic embedding
embed_model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = embed_model.encode(review_corpus, show_progress_bar=True)
# Stage 2: Inductive topic discovery (no predefined categories)
topic_model = BERTopic(
umap_model = UMAP(n_neighbors=15, n_components=5, metric="cosine"),
hdbscan_model = HDBSCAN(min_cluster_size=50, prediction_data=True),
embedding_model= embed_model,
verbose = True
)
topics, probs = topic_model.fit_transform(review_corpus, embeddings)
# Stage 3: Sentiment per behavioral theme
clf = pipeline(
"sentiment-analysis",
model = "cardiffnlp/twitter-roberta-base-sentiment-latest",
device = 0
)
df["sentiment"] = df["text"].apply(lambda x: clf(x)[0]["label"])
df["topic"] = topics
# Stage 4: Aggregate sentiment by topic and time window
# (tracks how reactions to platform changes evolve over months)
sentiment_ts = (
df
.groupby(["topic", pd.Grouper(key="date", freq="M")])["sentiment"]
.value_counts(normalize=True)
.unstack()
.fillna(0)
)
The R code below implements the causal inference component for the market structure research question. The Callaway-Sant'Anna estimator is used because it is robust to treatment effect heterogeneity across cities and cohorts, which is the realistic scenario when different urban markets adopted digital mobility platforms at different times under different regulatory conditions. This is the part of the methodology I am still deepening, and it represents the primary learning objective for the first year of the doctoral program.
# Panel econometrics: platform market structure analysis
# Staggered DiD + Two-Way Fixed Effects
library(fixest) # fast TWFE with clustered SE (Berge 2018)
library(did) # Callaway-Sant'Anna robust to heterogeneity
# Two-way FE: absorb city and time unobservables
model_twfe <- feols(
log_market_share ~ platform_entry * post
+ log_population + transit_density
| city_id + quarter,
data = mobility_panel,
cluster = ~city_id
)
etable(model_twfe, tex=TRUE)
# Staggered adoption: heterogeneity-robust ATT
# Handles cities adopting platforms at different times
cs_att <- att_gt(
yname = "log_market_share",
tname = "quarter",
idname = "city_id",
gname = "first_treat_quarter",
xformla = ~population + income_pc + transit_freq,
data = mobility_panel,
est_method = "dr" # doubly robust
)
# Event-study aggregation: plot pre/post treatment dynamics
es <- aggte(cs_att, type = "dynamic")
ggdid(es, title = "Effect of Platform Entry on Market Share")
# MaaS adoption prediction + SHAP interpretability
from xgboost import XGBClassifier
from sklearn.model_selection import StratifiedKFold
import shap
feature_groups = {
"behavioral" : ["modal_diversity_idx", "weekly_trips_avg", "pt_share_pre"],
"structural" : ["city_density", "transit_frequency", "gdp_per_capita"],
"regulatory" : ["open_data_mandate", "data_sharing_score", "subsidy_level"],
"platform" : ["n_competing_platforms", "avg_wait_time", "surge_freq"]
}
clf = XGBClassifier(
n_estimators = 500,
max_depth = 6,
learning_rate = 0.05,
subsample = 0.8,
colsample_bytree= 0.8,
eval_metric = "auc",
random_state = 42
)
# Cross-validated training
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
clf.fit(X_train, y_train)
# SHAP: which city-level features drive adoption?
explainer = shap.TreeExplainer(clf)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test, plot_type="beeswarm")
shap.waterfall_plot(shap.Explanation(shap_values[0], base_values=explainer.expected_value))
The dissertation is expected to produce a minimum of three published or near-published articles by the end of the funding period, each addressing one of the three research questions. Beyond publications, there are applied and methodological contributions that are worth being explicit about, because they represent goals that go beyond the academic output requirements of the program.
On the behavioral question: a documented empirical account of how algorithmic platform interventions, including pricing changes and interface redesigns, shift individual travel behavior over time. This does not currently exist in the transport economics literature in a form that draws on longitudinal individual-level data combined with credible causal identification.
On market structure: empirical evidence about the actual magnitude of data network effects in mobility markets, and what scale of data advantage translates into measurable competitive barriers. This is a question regulators across Europe are actively asking without much concrete quantitative work to rely on.
On MaaS adoption: a comparative analysis across city deployments that identifies which behavioral and structural conditions systematically predict adoption success, and how incumbents respond strategically to integrated platform entry. The policy relevance is direct given where European cities currently are in MaaS implementation cycles.
Methodologically: a demonstrated and replicable workflow for combining NLP analysis of user-generated content, behavioral machine learning on longitudinal trip sequences, and panel econometrics for causal inference in a transport and platform economics context. The combination is less common than it should be, and executing it properly would be useful beyond the specific questions of this dissertation.
The honest answer to why I am applying specifically here is that the fit is unusually direct. Prof. Groznik's research group works at the intersection of digital transformation of mobility ecosystems, with attention to how digitalization affects market structure, user behavior, and the strategic choices of platform operators. That is not a description I am retrofitting to my interests after reading the call for applications; those are the questions I have been thinking about for the past two years, since my time in Muscat gave me a practitioner's view of what platform-driven mobility transformation actually looks like in an urban market.
What I bring to this position is a combination of things that do not typically travel together. Mathematical training at the undergraduate and graduate levels, including ongoing work on the theoretical foundations of optimization algorithms used in large-scale machine learning. Applied NLP experience, including a peer-reviewed manuscript under review and open-source code implementing the full research pipeline. Industry experience building predictive models in a mobility-adjacent setting, which gave me a practitioner's understanding of what kinds of data exist, how messy it is in practice, and what the gap between analytical insight and actionable business decision actually looks like. And a specific substantive interest in how digital platforms reshape mobility markets that has been consistent across the last several years of my work, independent of the particular degree program I happened to be in.
What I am still building is depth in applied econometrics, particularly the causal inference methods needed for credible analysis of panel data in market structure research. This is a gap I cannot fill independently, and I want to be direct about that. The methodological environment at Ljubljana, and specifically the combination of economic theory and quantitative empirical rigor that characterizes Prof. Groznik's group, is the environment in which I can develop that depth properly. A doctoral program is not just a research opportunity; it is a professional formation, and I am applying to this one because it offers the specific formation I need, with the supervisor whose research is closest to the questions I care about.
The timing also matters. I am finishing my MSc at USM in July 2026. The optimization theory work has been more mathematically demanding than I anticipated, and working through the convergence analysis of adaptive gradient methods in high-dimensional non-convex settings has left me with a clearer sense of what rigorous quantitative research actually requires. I feel more ready for doctoral work now than I would have two years ago, and more certain about the direction I want it to take.
Cramer, J., & Krueger, A. B. (2016). Disruptive change in the taxi business: The case of Uber. American Economic Review, 106(5), 177–182.
Hensher, D. A. (2017). Future bus transport contracts under a Mobility as a Service (MaaS) regime in the digital age. Transportation Research Part A, 98, 86–96.
Parker, G., Van Alstyne, M., & Choudary, S. P. (2016). Platform revolution. W. W. Norton.
Rochet, J.-C., & Tirole, J. (2003). Platform competition in two-sided markets. Journal of the European Economic Association, 1(4), 990–1029.
Tirachini, A., & Gomez-Lobo, A. (2020). Does ride-hailing increase or decrease vehicle kilometers traveled? International Journal of Sustainable Transportation, 14(3), 187–204.
Callaway, B., & Sant'Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200–230.
Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv:2203.05794.