PIANOROLL-EVENT: A NOVEL SCORE REPRESENTATION FOR SYMBOLIC MUSIC
Abstract
Symbolic music representation is a fundamental challenge in computational musicology. While grid-based representations effectively preserve pitch-time spatial correspondence, their inherent data sparsity leads to low encoding efficiency. Discrete-event representations achieve compact encoding but fail to adequately capture structural invariance and spatial locality. To address these complementary limitations, we propose Pianoroll-Event, a novel encoding scheme that describes pianoroll representations through events, combining structural properties with encoding efficiency while maintaining temporal dependencies and local spatial patterns. Specifically, we design four complementary event types: Frame Events for temporal boundaries, Gap Events for sparse regions, Pattern Events for note patterns, and Musical Structure Events for musical metadata. Pianoroll-Event strikes an effective balance between sequence length and vocabulary size, improving encoding efficiency by 1.36× to 7.16× over representative discrete sequence methods. Experiments across multiple autoregressive architectures show models using our representation consistently outperform baselines in both quantitative and human evaluations.
Pianoroll-Event Data Representation
The process of converting pianoroll representation into pianoroll-events. Through frame segmentation, partitioning, and compression operations, the pianoroll is transformed into a sequence of pianoroll-events containing diverse event types.
Encoding Efficiency Comparison
| Method | Length (Avg.) | Vocabulary Size |
|---|---|---|
| Ours | 749.8 | 347 |
| REMI | 1339.7 | 330 |
| MIDILike | 1398.9 | 448 |
| REMI-BPE | 317.8 | 20,000 |
| ABCNotation | 2575.0 | 128 |
Table 1. Encoding comparison across different representation methods, demonstrating the effectiveness of our approach in balancing sequence length and vocabulary size.
Generated Songs
The following musical pieces are generated by our best-performing Transformer decoder model with Llama architecture, conditioned only on time signature and BPM. The results demonstrate the considerable potential of our representation method.
MIDI Demo
LSTM
GPT SMALL
GPT LARGE
LLAMA
BibTeX
@article{YourPaperKey2026,
title={PIANOROLL-EVENT: A NOVEL SCORE REPRESENTATION FOR SYMBOLIC MUSIC},
author={Lekai Qian*, Haoyu Gu*, Dehan Li*, Boyu Cao, Qi Liu†},
journal={...},
year={2026},
url={...}
}