dyllonj

Research and writings on language models, AI systems, and cognition.

Research

Measuring LLM Personality: GPT-5.2 vs Claude Opus 4.5

January 2026 · Research

4,368 personality evaluations reveal systematic differences between frontier models. Claude scores higher in Openness (+4.5) and Curiosity; GPT-5.2 leads in Conscientiousness (+5.3). Effect sizes range from moderate to large (Hedges' g = 0.4–0.8).

Cross-Vendor Personality Comparison: Grok, GPT-5.2, Claude Opus 4.5

January 2026 · Research

9,325 evaluations across xAI, OpenAI, and Anthropic. Cross-vendor personality effects are 3–4x larger than within-vendor variation. Grok exhibits 3x the context sensitivity of GPT-5.2. PCA reveals three factors explaining 79.5% of variance.

COSER: Steering Vectors Override Fine-Tuning in Strategic Games

2026 · Preliminary Results

Personality-steering vectors can orthogonally modify strategic behavior in role-playing AI agents. Most strikingly, vectors override literary persona conditioning: -Openness made the Joker behave rigidly despite his chaotic fine-tuning (p < 0.0001).

Essays

The Convergent Mind

August 2025

Multi-agent orchestration inevitably produces collective identity formation. Team bonding becomes personality erosion. On preserving structured diversity in AI agent ensembles.