Research Blog

October 2, 2025
in Statistics
8 min read

What even is a "parameter"?

"Parameter" is one of those commonly used words in mathematics and computing and in my experience is rarely explicitly defined. While most uses have similar meanings, there can be small differences in their interpretation. Parameters and other statistical entities are important to the semantics and correctness of the Helical system, so it's worth considering what we mean by these terms.

OED Definitions

Let's start by taking a look at how the Oxford English Dictionary defines "parameter".

Regular old mathematics

Under the entry for mathematics we have two definitions:

A quantity which is fixed (as distinct from the ordinary variables) in a particular case considered, but which may vary in different cases; esp. a constant occurring in the equation of a curve or surface, by the variation of which the equation is made to represent a family of such curves or surfaces (cf. quot. 1816 at sense I.1). Also (Computing): a quantity whose value is specified when a routine is to be performed. (1833-)

An independent variable in terms of which each coordinate of a point of a curve, surface, etc., is expressed, independently of the other coordinates. (1855–)

These two definitions track with what I'd guess is most people's first exposure to parameters, in physics. For example, let's consider Hook's law: \(F = kx\). Here \(F\) represents force, \(k\) is a spring constant, and \(x\) is distance. The parameter is "fixed...in a particular case considered": i.e., there is only one value of \(k\) for each spring. To make this relationship very clear, we might instead express Hook's law using the notation: \(F_s = k_s x\) to indicate that this \(k\) is the spring constant for this spring (called \(s\)) and that \(F_s\) is the force required to stretch \(s\) \(x\) units of distance.

If we wanted to make it clear that the parameter \(k\) can be thought of as a special kind of input, we might alternatively write Hook's law as \(F(x; k) = k x\). Here we are using the semicolon (rather than comma) to indicate that there is a qualitative and semantic distinction between these two inputs.

We can also think about Hook's law as a higher-order function \(F : S \rightarrow \mathcal{R} \rightarrow \mathcal{R}\) where \(S\) is the domain of springs or the domain of numeric spring identifiers, or something similar. We would apply \(F\) to our spring of interest and get a unary function that has instantiated and fixed the spring constant. Note that in this context, \(k_s\) is the parameter, not \(s\).

Parameters specialize functions in some way; we often say that a parameter "indexes into" a "family" of functions.¹ What this example also illustrates is that the need to refer to a term in an equation as a "parameter" in many situations is a proxy for scope (especially in contexts where we don't really have a notion of scope). We expressed scope above using a higher order function, but we could just as well have used a let binding, arguments to a command line program, or another mechanism for specializing computation.

Statistics

Under the entry for statistics we have:

A numerical characteristic of a population, as distinguished from a statistic obtained by sampling. (1920-)

The first set of definitions make no mention of the domain, nor the semantics of the parameter. Statistics narrows the scope considerably.

Parameters in Statistics vs. Probability

First a refresher on how statistics differs from probability theory: a statistic is any function of data² and so the study of statistics is the study of functions of data. Probability theory is the formal study of the properties of stochastic phenomena. One way to think about their relationship to each other is that probability theory provides a language for describing the ground truth of stochastic phenomena, while statistics provides a language of relations over data.

It is important to distinguish between parametric distributions and parameterized models or parametric statistics.

Parameters in Probability Theory

In probability theory, a distribution may be parametric or non-parametric. It can be easier to define a non-parametric distribution first, but to do so, we need to recall some basic defintions.

Non-Parametric Distributions

Most introductions to probability theory I've seen begin with a set-theoretic treatment of the sample space where we assign probabilities to events (i.e., subsets of the sample space). The function that assigns these probabilities must obey certain axioms for its to be a well-formed probability function. When events can be meaningfully assigned numeric values, we say that the map from events to numbers is a random variable and that the mapping from those numbers to probabilites is its distribution (i.e., a probability function for random variables). We typically denote a random variable with a capital Roman letter.

We typically choose to use random variables instead of events when the probabilistic queries we want to ask are with respect to the random variable's codomain. That is, when we write \(P(X = n)\) (what is the probability that \(X\) is \(n\)?), this statement should be understood to mean \(\sum_{\lbrace e \mid e \subseteq \Omega \wedge X(e) = n \rbrace}f_\Omega(e)\), where \(f\) is the mass or density function associated with the underlying sample space.

When the number of the coefficients of the terms of the functional form of \(f\) is not strictly less than the size of the sample space³ minus one, then we say the distribution is non-parametric.⁴ Symbolically, if we let \(\Theta\) represent the parameters, then when \(\Theta < \Omega - 1\), the distribution is parametric. One consequence of this definition is that any finite sample space can be represented by a parametric distribution because in the worst-case scenario, we need \(|\Omega| - 1\) parameters for the first \(|\Omega| - 1\) elements of the sample space; the mass associated with the last element must be one minus the sum of the parameters (i.e., it is no longer free to vary).

Because \(\infty - 1 \equiv \infty\), discrete sample spaces of infinite size are non-parametric. We would need to look to more sophsticated probabilistic process models in order to encounter such a distribution; they are out of scope for this already-too-long post, but for interested readers, one such example is the Chinese Restaurant Process. Critically, such distributions may have terms we refer to as "parameters" but they are non-parametric distributions; the "parameters" of these mathematical objects are parameters in the traditional mathematical sense, not in the probability-specific sense.

Parametric Distributions

Parametric distributions are what you typically learn in an introduction to probability theory: e.g., Categorial, Binomial, and Normal distributions. One particularly useful aspect of parametric distributions is that you can compute the probability of any event knowing only the parameters.

The parameters of parametric distributions typically have meaningful "real-world" interpretations. This is where they connect to statistics.

Parameters in Statistics

We can use statistics to answer questions about a single variable (univariate models) and to answer questions about multiple variables (multivariate or associational models). One of the primary goals of statistics is to use the data we have now, that we have actually collected and observed in the world, to make inferences about the general space of data in our domain of interest.

Non-Parametric Statistics

Just like in non-parametric probability distributions, we could make the argument that there are parameters in the mathematical sense (e.g., coefficiences of a fitted curve), but the number needed to describe mass or density on a sample space may grow with the data. We may use non-parametric statistics to estimate data drawn from either parametric or non-parametric distributions; there are valid reasons to pick one or another, but fundamentally parametricity of the underlying distribution of the dat does not determine the statistical methods we use.

Parametric Statistics

In parametric statistics, we start with an assumed model family and data.

When our model is univariate, the model corresponds to a probability distribution and the task is to estimate its parameters from data. For example, we might model the length of a six week old Beagle puppy as a Normal (\(X\sim\mathcal{N}(\mu, \sigma^2)\)) distribution. Given a large enough random sample of Beagle pupples, we can compute the sample average length; sample average is what's known as an unbiased estimator for the parameter \(\mu\) of a Normal distribution.

You might infer inductively from the above paragraph that when our model is multivariate, we are seeking to model a joint probability distribution. However, that's typically not the case! Instead, the most commonly used multivariate model is linear regression, which has the form \(Y = \beta_0 + \beta_1 X_1 + \cdots + \beta_n X_n\). In order to make estimation tractable, most models assume a functional form for each \(X_i\) such that all \(X_i\) have equal variance to each other, e.g., \(X_i \sim \mathcal{N}(\mu_i, \sigma^2)\).

Critically, the "parameters" of this model are the coefficients \(\beta_i\), not \(\mu_i\). While the \(\mu_i\) are related to the coefficients we seek to estimate, they are not the parameters that definie this parametric distribution. They are however weights that may be assigned sime kind of semantics related to summary information about the data.

Final Notes

Because statistics as a field is definitionally about data, it is fundamentally empirical and thus based on observation of phenemena in the real world. These data are drawn from a population, hence the appearance of "population" in the OED's definition. We won't get deep in the weeds on what a "population" really is (that's for another blog post!); instead it's important to think about a "population" mapping abstractly to a space of data points and the parameter being a fundamentally unobserved mathematical object that defines that space.

Without getting too ahead of myself, I'd argue that the language of "indexing into" a family of functions implies that after specialization, the resulting relation is a function, which is not necessarily true without additional assumptions and kind of the whole point of this and forthcoming blog posts. :) ↩
Unless otherwise noted, all defintions are from the textbook I used in my graduate course in fundamental statistics: Statistical Inference, 2nd edition, by Casella and Berger. ↩
I haven't actually seen or found any satisifying definitions of parametric vs. non-parametric distributions in a strict probability theory context in any of the reputable textbooks I've used; this seems to be one of those things that authors assume everyone understands. Therefore, what I'm presenting is my own understanding of the terminology from having taught this material several times. Please do drop me a line if my definition is either incorrect or if you are aware of an appropriate citation! ↩
We are defining parametricity in terms of a mass or density function \(f\) on a sample space \(\Omega\), but all of these arguments also apply in the case of a random variable \(X\) and its support \(\mathcal{X}\). We do not get into the definition of a support here both because it is not especially relevant and because a proper definition for the continuous case requires a discusion of Borel spaces, which is getting quite far afield from what I wanted to focus on in this post! ↩

September 17, 2025
in DSLs
4 min read

DSL Usability Research

In my previous post, I asserted:

...learning a new formal language can itself contribute to the difficulty of encoding an experiment.

This statement was based on assumptions, intuitions, and folk wisdom. I started digging into the DSL usability research to see if I could find explicit support for this statement. This blog post is about what I found.

Suppose I have a DSL for a task that was previously manual. I want to conduct a user study. I decide to use some previously validated instrument to measure differences in perceived difficulty of encoding/performing a task (\(D\)), and vary the method used to code the task (\(M=\text{DSL}\) vs. \(M=\text{manual}\)). Suppose there is no variability in task difficulty for now: the specific task is fixed for the duration of the study, i.e., is controlled.

Ideally, I'd like to just measure the effect of \(M\) on \(D\); we are going to abuse plate notation¹ a bit and say that the following graph denotes "method has an effect on percieved difficulty of performing a specific task for the population of experts in the domain of that task:"

flowchart LR
  M("Method ($$M$$)")
  subgraph ppl [domain experts]
      D("Percieved Difficulty of Task ($$D$$)")
  end
  M --> D

The first obvious problem is that \(D\) is a mixture of some "inherent" difference due to \(M\) and the novelty of the method/context/context/environment/situation (\(N\)). We have not included \(N\) in our model; let's do so now:

flowchart LR
  M("Method ($$M$$)")
    subgraph ppl [domain experts]
      direction TB
        N("Novelty ($$N$$)")
    D("Percieved Difficulty of Task ($$D$$)")
    end
  M-->D
  N-->D

Conducting a naïve study results in \((D \vert M=\text{manual}, N = 0)\) vs. \((D \vert M=\text{DSL}, N \gg 0)\). This is why we have the study participants perform a training task first: it's an attempt to lower \(N\) as much as possible, i.e., to control for novelty.

Training tasks are obviously not unique to DSL research; however, there are other tactics for reducing novelty that are unique to programming systems. For example, it seems obvious that IDE features like syntax highlighting and autocomplete that are part of a "normal" programming environment would reduce the value of \(N\); so would integrating the DSL into the target users' existing toolchain/workflow.

If we allow the task to vary, then our model needs to include another potential cause for \(D\):

flowchart LR
  M("Method ($$M$$)")
  C("Task Complexity ($$C$$)")
  subgraph ppl [&nbsp;&nbsp;domain experts]
    direction TB
    N("Novelty ($$N$$)")
    D("Percieved Difficulty of Task ($$D$$)")
  end
  M --> D
  N --> D
  C --> D

The details of how we represent \(C\) matter: whatever scale we use, it contains a baked-in assumption that for any two tasks \(t_1\) and \(t_2\) where \(t_1\not=t_2\), but \(C(t_1)=C(t_2)\), we can treat \(t_1\equiv t_2\). This is a big assumption! What if there are qualitative differences between tasks not captured by the complexity metric that influence \(D\)? In that case, we may want to use a different variable to capture \(C\), perhaps a binary feature vector, or maybe we want to split \(C\) into a collection of distinct variables. Maybe task complexity isn't objective but subjective, in which case we would want to include in our domain experts plate. Maybe we want to forego \(C\) altogether and instead treat tasks as a population we need to sample over, e.g.,

flowchart LR
  M("Method ($$M$$)")
  subgraph ppl [&nbsp;&nbsp;domain experts]
  subgraph tasks [tasks]
    direction TB
    N("Novelty ($$N$$)")
    D("Precieved Difficulty of Task ($$D$$)")
  end
  end
  M --> D
  N --> D

I have plenty more to say and would love to iterate on the design of this hypothetical user study, but I am going to stop here because the above diagram feels like something that should already be established in the literature. Like a lot of folk wisdom, it's suggested, implied, assumed, and (I think!) generally accepted, but so far I have not found any explicit validation of the above schema. That doesn't mean it isn't out there; it means that (a) there isn't a single canonical paper accepted by the community as evidence and (b) where the evidence does exist, it's embedded in work that primarily addresses some other research question.

So, for now, I am putting together a DSL usability study reading list of works that I think touch on this fundamental problem in meaningful ways. I consider Profiling Programming Language Learning and PLIERS: A Process that Integrates User-Centered Methods into Programming Language Design seed papers and have gotten recommendations from Andrew McNutt, Shriram Krishnamurthi, and Lindsey Kuper. Please feel free to add to this (or use it yourself!). I look forward to writing a follow up post on what I find. :)

While the plate notation here looks similar to the output that Helical produces for HyPL code, the specific graphs are more precise than those that Helical can currently produce. For example, only \(D\) is embedded in the domain experts plate. Helical's current implementation would place both \(M\) and \(D\) in this plate. ↩

September 16, 2025
in Helical, Jupyter, DSLs
2 min read

Jupyter DSLs

One of the broader goals of the Helical project is to make writing, maintaining, and debugging experiments easier and safer for the end-user through a novel domain-specific language. However, learning a new formal language can itself contribute to the difficulty of encoding an experiment. Therefore, we are intersted in mitigating the effects of language learning/novelty. To this end, a Northeastern coop student (Kevin G. Yang) investigated the suitability of using Jupyter notebooks as an execution environment for experiments last year.

Jupyter notebooks are commonly used by empiricists. If we want empiricists to use Helical, then it would make sense to integrate it into empiricists' computational workflow. Kevin began investigating the feasibility of adding such support for features such as syntax highlighting and code completion to Jupyter. This actually turned out to be a surprisingly difficult task!

Kevin ended up doing a deep dive into the Jupyter code base and issue database, resulting in an experience report and tutorial that he presented internally at the Northeastern Programming Research Lab's seminar series and externally at PLATEAU 2025. While his coop focused on a specific implementation task, the work led us to ask new research questions. For example, we were somewhat surprised by the breadth of tooling empirical scientists were using and that there was demand for custom syntax highlight organically in the Jupyter user base — conventional wisdom in the PL community is that DSLs are a bit niche! Thus, rather than focusing on Helical specifically, we broadened the task to DSL support in Jupyter more generally.

At the start of his coop, I had envisioned Kevin integrating Helical into Jupyter and then pivoting to a reproduction study. However, as he was working on the project, he became increasingly interested in visualization and usability. We were hoping to perform a user study in Summer 2025 to further investigate some of the research questions that arose and perhaps send a conference paper submission out to CHI or UIST; that thread was put on hold as Kevin continues his career exploration journey.

August 21, 2025
in Helical
6 min read

Introduction to Digital Twins

Recently, I've been reading about this new technology called digital twins. I started with this paper. I think it's a great introduction, and it's also the one that my research supervisor has recommended to me.

I had no idea what a digital twin is. I have not heard of this phrase at all, and the first impression of it was that it felt similar to what NFTs are. It's also a digital representation of a real-world physical object, and that was my first impression of what a digital twin could be.

As I continued reading the paper, I found out that no, it's not like that. Digital twins are online objects or online clones of something in the physical world, but there are huge differences. The digital twins are alive, where they gather real-time data from the object or system that it is representing in the real world, so it keeps updating itself. An NFT is an online object, it's something that does not change, and that is one of the key properties of NFTs.

Something that is quite talked about on the internet is how digital twins have a use over regular simulations, which is being heavily used in different areas of the world. They have similar features; they both gather data and they both try to simulate what something in the real world or even in the digital world might do. They try to predict the future. Now they are more similar, I think. They have a similar "go down inside" --- the real difference here is: simulation is where you feed it the data before it starts and try to predict everything afterwards based on the data you feed it, whereas the digital twin is more of a progression. The digital twin will evolve as the physical twin progresses in the real world, like a twin where they grow, they evolve, they do everything basically simultaneously. And that's the thing with the Digital Twin, you basically get a realtime, digital clone of something in the real world, whether that is an object, whether that is a system, whether it is anything that has sorts of data that you could replicate in the real world. And basically, that is the real goal of what digital twins are.

On a side note, there seems to be a lot of buzzwords that can be fitted into presenting digital twins in this paper. I've read a lot about AIs, LLMs, machine learning, different models, I've even seen blockchain, security... It seems like potentially could grab the investors' attention one day.

When I was reading the paper on digital twins, I was wondering how this technology would be able to fit within the scope of research that I was doing within this co-op, as the paper's examples of digital twins were all manufacturing-related. After reading this paper, I realized why my research supervisor wanted me to learn about these systems. The paper presents "Ysocial", a digital trend platform that replicates an online social media environment.

In short, YSocial allows users to simulate a social media environment using LLMs. Some of the possible use cases for this would be to simulate political discussions on platforms like Twitter. It does this by utilizing LLM agents to mimic how real-world humans discuss, in this case, political topics on social media. Thanks to the development of AI, this can be done very easily compared to what it was like a decade ago.

It also is a huge playground for researchers to gain insights into how LLMs are actually performing in trying to mimic humans. By using YSocial, it allows researchers to gain huge amounts of data and try out different kinds of settings to play around with to see whether the LLMs we have today can actually human-led environment online would look like and to see what kind of different settings they can address in order to achieve different results on potentially different social media platforms.

For example, on Instagram, it is not strictly the same concept compared to Twitter (where Instagram is more photo and image-based). It is a very distinct platform compared to Twitter (where Twitter is more opportunity for you to express your feelings). Instagram is more of a record, a place where you keep track of what you have done. You can put all of your photos, you can even put your stories (what your recent activities have been) onto Instagram. By adjusting different sliders and adjusting different personalities, the agents on the Why Social platforms behave. It could potentially be a very powerful tool for researchers to dive into.

Trying out YSocial

I've tried to play around with YSocial and trying to set it up locally. I ran across a few problems that I was unable to solve on my own, so I reached out to the team behind YSocial and got some very helpful feedback!

Initially, while I was able to access the main dashboard of YSocial, I ran into problems afterwards. After following the instructions on that page and creating different experiments and agent populations, and actually activating the simulations, I was unable to get any of the results or get any of the posts that I would assume the agents would have created in order to have this simulation off Twitter. Instead, when I enter the simulation, I only get this posting page where I can act as a user of the social media platform and post something myself. It was initially very strange to me that a tool that would simulate social media doesn't actually give me any posts or any of the data that comes along with a simulation of social media. I later learned that this process can take a long time.

On the website, I also saw that there is a "hard way" to do it where I would have to set up my server and my client separately. I presume that it allows for more customization of the setup. I believed and confirmed with the authors that the documentation here is a bit outdated. I would have to play around and find out a lot more about the tool itself. See where some of the things have fallen out of date and replace them in the code. I was not successfully run my social distro yet, so hopefully in the coming few days I'll be able to find out what is wrong in the documentation or in the code for the "hard way."

One thing I discovered while trying YSocial is that some of the features on the web dashboard do not seem to work offline, even when I try to set up a local LLM through Ollama. When trying to create the "experiments" and "populations", if there is no network connection, the dashboard would not display the already created ones, instead only showing empty tables. A big todo for me to fully go through the codebase, as YSocial is open source, to see where this problem lies.

Looking ahead, something that me and my research supervisor would like to do is try and have YSocial communicate with Mastodon, which is the social media platform that we are focusing on. The goal is to set up a locally run master.instance where we would have Y Social acting as the LLM agent for users on that instance. We would try to figure out various kinds of things that we can play around with. Starting off, we would see what happens when changing some of the factors or configurations on Mastodon, especially related to privacy. These changes might affect how Mastodon is used, but with actual Mastodon, we are unable to simulate the effects of these changes with actual users. With these LLM agents, we can try and simulate the effects of these changes in a less harmful and less dangerous way.

Looking at this tool, I would imagine that it is wildly powerful and useful if being used correctly. As I figure out how some of the features of this platform actually work, I believe that I'll be able to simulate and gather a bunch of simulated data about social media and AI, and especially the combination between them. I hope I will be able to utilize this tool in various areas, including but not limited to Mastodon.

August 20, 2025
in Helical
2 min read

New student, new project!

I want to extend a belated welcome to Zixuan (Jason) Yu, a Northeastern University undergraduate student who is working with me on a research coop through December 2025. Jason's project focuses on identifying elements of the Mastodon code base where we might either want to intervene (in order to answer a research question) or where there might be associated privacy considerations.

Jason's project combines goals from the Privacy Narratives project and the Helical project. He will be posting here regularly, but before then, let's dicuss the connection between privacy and experimentation.

As Donald Campbell wrote in Methods for the Experimenting Society,

[Social p]rograms are ... continued or discontinued on inadequate ... grounds[,] ... due in part to the inertia of social organizations, in part to political predicaments which oppose evaluation, and in part to the fact that the methodology of evaluation is still inadequate.

Mastodon as both a software platform and as a collection of communities has less of this inertia. We can think of each Mastodon instance as being a little society. This multiplicity and diversity could present incredible opportunties for empowering citizen social scientists. Insofar as computing systems can be made to have less friction with respect to experimentation, Mastodon's role as a FOSS platform seems ripe with opportunity. Unfortunately, in this context, Campbell's vision for an experimenting society may sound a bit like moving fast and breaking things: an ethos that many Fediverse communities reject.

This is NOT what we want! At first blush, it would seen that a notion of trust is missing from Campbell's essay. On closer inspection, however, we find trust's cousin, participant/citizen privacy. Campbell's mentions of privacy focus on scenarios where participants might be unwilling to disclose information to the researcher or research organization; today there are many more parties that may violate trust. We would argue that a violation of trust is the primary harm --- or at least the primary percieved harm --- that experimentation in social networks can cause, and therefore cannot be considered as a separate concern.

It is with this context in mind that Jason will be identifying possible intervention points and scenarios that could cause privacy vulnerabilities in the Mastodon code base.

August 18, 2025
in Admin
1 min read

Welcome!

This is the first in what I hope will be many blog posts on the relationship between experimentation and programming languages.

First, the meta: there are two ways to categorize the work that this blog will cover. One is by the methods used: this work exists at the intersection of formal and empirical research methods and so some posts will be quite technical, focusing on formal language design, logic, causal inference, and general "mathiness." In a lot of ways, this isn't particularly useful information: we might as well say we are "doing science," since empirical and formal methods describe a lot of the tools we use to generate scientific knowledge. On the other hand, that framing distinguishes the goals of this work from ordinary academic scientific activity, where we typically use a small set of methods for a specialized set of domains, which leads us to the observation that...

The other way to categorize this work is by application area. We are broadly interested in "systems," which is a general word to describe objects in the world that interact with each other. We are specifically interested in computer-mediated systems, which range from sociotechnical systems like Mastodon and social games, to typical computer systems like databases, operating systems, or programming frameworks. These application areas provide contexts in which we can evaluate the methods we study, while also providing inspiration for the methodological problems we address.