PIANOROLL-EVENT: A NOVEL SCORE REPRESENTATION FOR SYMBOLIC MUSIC

South China University of Technology

*Indicates Equal Contribution,Indicates Corresponding Author

Abstract

Symbolic music representation is a fundamental challenge in computational musicology. While grid-based representations effectively preserve pitch-time spatial correspondence, their inherent data sparsity leads to low encoding efficiency. Discrete-event representations achieve compact encoding but fail to adequately capture structural invariance and spatial locality. To address these complementary limitations, we propose Pianoroll-Event, a novel encoding scheme that describes pianoroll representations through events, combining structural properties with encoding efficiency while maintaining temporal dependencies and local spatial patterns. Specifically, we design four complementary event types: Frame Events for temporal boundaries, Gap Events for sparse regions, Pattern Events for note patterns, and Musical Structure Events for musical metadata. Pianoroll-Event strikes an effective balance between sequence length and vocabulary size, improving encoding efficiency by 1.36× to 7.16× over representative discrete sequence methods. Experiments across multiple autoregressive architectures show models using our representation consistently outperform baselines in both quantitative and human evaluations.

Pianoroll-Event Data Representation

First research result visualization

The process of converting pianoroll representation into pianoroll-events. Through frame segmentation, partitioning, and compression operations, the pianoroll is transformed into a sequence of pianoroll-events containing diverse event types.

Encoding Efficiency Comparison

Method Length (Avg.) Vocabulary Size
Ours 749.8 347
REMI 1339.7 330
MIDILike 1398.9 448
REMI-BPE 317.8 20,000
ABCNotation 2575.0 128

Table 1. Encoding comparison across different representation methods, demonstrating the effectiveness of our approach in balancing sequence length and vocabulary size.

Generated Songs

The following musical pieces are generated by our best-performing Transformer decoder model with Llama architecture, conditioned only on time signature and BPM. The results demonstrate the considerable potential of our representation method.

MIDI Demo

LSTM

GPT SMALL

GPT LARGE

LLAMA

BibTeX

@article{YourPaperKey2026,
  title={PIANOROLL-EVENT: A NOVEL SCORE REPRESENTATION FOR SYMBOLIC MUSIC},
  author={Lekai Qian*, Haoyu Gu*, Dehan Li*, Boyu Cao, Qi Liu†}, 
  journal={...}, 
  year={2026}, 
  url={...} 
}