pyin

Abstract

Fundamental tracking implementing the pYIN algorithm

Description

pYIN is a modificatin of the YIN algorithm for f0 estimation. In the first step of pYIN, F0 candidates and their probabilities are computed using the YIN algorithm. In the second step, Viterbi decoding is used to estimate the most likely F0 sequence and voicing flags. This streaming version uses only past observations (no lookahead) and is thus somewhat less effective than offline pyin.

Syntax

kfreq, kconfidence, kvoiced pyin asig, Sarg1, ivalue1, [Sarg2, ivalue2, ...]

Arguments

asig: audio signal
Sarg1: argument name, see below
ivalue1: argument value

Parameter	Range / Default	Description
framesize	1024-8192 (2048)	Analysis frame size in samples
hop	64- (1/4 framesize)	Hop size in samples
fmin	20 -	Min. frequency
fmax	20 -	Max. frequency
bins	2 - 50	Subdivisions per semitone
transition_weight	0-1 (0.1)	Prob. of switching voiced↔unvoiced
minrms	0-1	Mark sound with rms below this as silent
beta_a	1 - 3	param a for beta distr. prior over thresholds.
beta_b	1 - 20	param b for beta distr. Lower=smoother track
drift	10- (100)	Pitch variation between frames, in cents
voiced_obs_floor	0-1	Floor for voiced observations
octave_cost	0-0.5 (0.)	Cost of jumping an octave between frames. Use if the algorithm falsely predicts the 2nd overtone as the fundamental
subharmonic_tresh	4	Used together with octave_cost, controls the threshold of a downward octave jump

Output

kfreq: detected frequency. Only valid if confidence is > ~0.4
kconfidence: detection confidence
kvoiced: is the sound voiced

Syntax

kfreq, kconfidence, kvoiced pyinf0 asig, iminfreq=60, imaxfreq=1000, ibufsize=2048, ioverlap=4, ktransprob=0.99, ibins=4, kdrift=5

Arguments

asig: audio signal
iminfreq: min. frequency for f0 (default=60)
imaxfreq: max. frequency for f0 (default=1000)
ibufsize: size of the analysis frame (default=2048)
ioverlap: overlapping frames. hopsize=bufsize/overlap (default=4)
ktransprob: hmm transition probability (default=0.99)
ibins: number of bins per semitone (default=4)
kdrift: pitch drift between frames, in semitones (default=5)

Execution Time

Performance

Examples

<CsoundSynthesizer>
<CsOptions>
-odac
</CsOptions>

<CsInstruments>
sr     = 44100
ksmps  = 64
nchnls = 2
0dbfs  = 1

/* example file for pyin

Syntax:

kfreq, kconfidence, kvoiced pyin asig, Sarg1, ivalue1, [Sarg2, ivalue2, ...]

Args:
    * asig: audio signal
    * Sarg1: an argument name, see below
    * ivalue1: the argument value

Output:
    * kfreq: detected frequency. Only valid if confidence is > ~0.4
    * kconfidence: detection confidence
    * kvoiced: is the sound voiced


Parameter          Range          Default
-----------------  ------------  -----------------
framesize          1024 - 8192    2048
hop                64 -           1/4 of framesize
fmin               20 -           60
fmax               20 -           1000
bins               2 - 50         10
transition_weight  0. - 1.        0.1
minrms             0. - 1.        0.
beta_a             1 - 3          2
beta_b             1 - 20         3
drift              10 -           100
voiced_obs_floor   0-1            0.
octave_cost        0.             0. - 0.5
subharmonic_tresh  4              1 - 8

*/


instr 1
  asig1 = oscili:a(0.5, 500)
  ; asig2 = diskin2("../../else/examples/finnegan01.flac", 1, 0, 1)[0]
  asig2 = diskin2("../../else/examples/voiceover-fragment-48k.flac", 1, 0, 1)[0]
  asig3 = buzz(0.1, 300, 7, -1)
  asig4 = pinker() * 0.1
  Snames[] fillarray "sine  ", "speech", "buzz  ", "pink  "
  ksource init 0
  if metro(1/3) == 1 then
    ksource = (ksource + 1) % 4
  endif
  ; asig = picksource(ksource, asig1, asig2, asig3, asig4)
  asig = asig2
  ; ar compress2 aasig, acsig, kthresh, kloknee, khiknee, kratio, katt, krel, ilook
  asigpyin = compress2:a(asig, asig, -90, -40, -20, 6, 0.01, 0.5, 0.02)
  asigpyin *= 3
  ; asigpyin = asig
  kpitch, kconf, kvoiced pyin2 asigpyin, "framesize", 2048, "hop", 256, \
    "fmin", 70, "fmax", 600, "transition_weight", 0.01, "beta_b", 1.6, \
    "drift", 150, "voiced_obs_floor", 0.1
  ksound = schmitt(dbamp(rms(asig)),  -45, -60);
  kenv = schmitt:k(kconf, 0.3, 0.2) * schmitt:k(kpitch, 90, 75) * ksound;
  if metro(24) == 1 && kenv > 0 then
    printsk "pitch: %.1f, conf: %d, voiced: %d, sound: %d\n", kpitch, kconf*100, kvoiced, ksound
  endif
  outch 1, asigpyin * 0.9
  outch 2, vco2(0.1, kpitch) * a(kenv)
endin

</CsInstruments>

<CsScore>
i1 1 20

</CsScore>
</CsoundSynthesizer>

LISTEN

Metadata

Author: Eduardo Moguillansky
Year: 2026
Plugin: else
Source: https://github.com/csound-plugins/csound-plugins/blob/master/src/else/src/else.c

pyin

Abstract

Description

Syntax

Arguments

Output

Syntax

Arguments

Execution Time

Examples

See also

Metadata