Avaliação de Descritores Acústicos Aplicados à Comparação Forense de Locutor

Adelino Pinheiro Silva, Maurílio Nunes Vieira, Adriano Vilela Barbosa

Resumo


A comparação forense de locutor (CFL) consiste no confronto entre características de dois áudios, com o objetivo de associar as falas do áudio questionado a um indivíduo conhecido. Esse áudio, na maioria dos casos, é oriundo de interceptações telefônicas e possui codificação GSM (Global System Mobile), banda estreita e ruído de canal. Levantamentos do cenário mundial em CFL, realizados em 2011 e 2016, respectivamente pela Universidade de York e INTERPOL, indicaram que muitos peritos forenses baseavam-se em análises perceptuais e acústicas. Em contrapartida, a utilização de metodologias automáticas e assistidas são menos utilizadas. Nesse nicho, o presente trabalho busca explorar o potencial de características/descritores acústicos, como Componentes Mel Cepstrais e analisar o poder discriminante destas características acústicas extraídas de corpus. Os experimentos utilizaram cinco tipos de ruído em seis níveis de relação sinal ruído. Os cenários das comparações visam aproximar as condições forenses considerando a codificação GSM, a banda do sinal e o ruído de canal.

Palavras-chave


Comparação Forense de Locutor, Análise cepstral, Taxa de mesmo erro.

Texto completo:

PDF

Referências


E. Gold; P. French. International practices in forensic speaker comparison. International Journal of Speech Language and the Law. 18:293–307 (2011).

G.S. Morrison; F.H. Sahito; G. Jardine; D. Djokic; S. Clavet; S. Berghs; C.G. Dorny. Interpol survey of the use of speaker identification by law enforcement agencies. Forensic Science International 263:92–100 (2016).

S.S. Tirumala; S.R. Shahamiri; A.S. Garhwal; R. Wang. Speaker identification features extraction methods: A systematic review. Expert Systems With Applications 90:250–271 (2017).

Y. Kinoshita; S. Ishihara; P. Rose. Exploring the discriminatory potential of F0 distribution parameters in traditional forensic speaker recognition. International Journal of Speech, Language & the Law 16:91–111 (2009).

G.S. Morrison; C. Zhang; P. Rose. An empirical estimate of the precision of likelihood ratios from a forensic-voice-comparison system. Forensic Science International 208:59–65 (2011).

E. Enzinger; G.S. Morrison. Empirical test of the performance of an acoustic-phonetic approach to forensic voice comparison under conditions similar to those of a real case. Forensic Science International 277:30–40 (2017).

R.R.d. Silva; J.P.C.L. da Costa; R.K. Miranda; G. Del Grado. Aplicação do valor de base da frequência fundamental via estatística MVKD em comparação forense de locutor. Revista Brasileira de Criminalística 5:30–38 (2016).

N.R. Council; et al. Strengthening Forensic Science in the United States: A Path Forward. National Academies Press (2009), 47–51, 142–154, 183–200. continuously spoken sentences. In Readings in Speech Recognition, 65–74. Elsevier (1990).

G.S. Morrison. Forensic voice comparison and the paradigm shift. Science & Justice 49:298–308 (2009). [21] J. Kacur; M. Varga; G. Rozinaj. ZCPA features for speech recognition. In Telecommunications (BIHTEL), 2012 IX International Symposium on, 1–4. IEEE (2012).

R. Togneri; D. Pullella. An overview of speaker identification: Accuracy and robustness issues. IEEE Circuits And Systems Magazine 11:23–61 (2011). [22] D.A. Reynolds; T.F. Quatieri; R.B. Dunn. Speaker verification using adapted gaussian mixture models. Digital

signal processing 10:19–41 (2000).

S.B. Davis; P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in [23] R.O. Duda; P.E. Hart; D.G. Stork. Pattern Classification. John Wiley & Sons (2012), 53,136.

D.A. Reynolds. Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing 2:639–643 (1994).

D.S. Kim; S.Y. Lee; R.M. Kil. Auditory processing of speech signals for robust speech recognition in real-world noisy environments. IEEE Transactions on Speech and Audio Processing 7:55–69 (1999).

C. Kim; R.M. Stern. Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 24:1315–1329 (2016).

H. Hermansky. Perceptual linear predictive (PLP) analysis of speech. the Journal of the Acoustical Society of America 87:1738–1752 (1990).

H. Hermansky; N. Morgan. RASTA processing of speech. IEEE Transactions on Speech and Audio Processing 2:578–589 (1994).

M.M. Jam; H. Sadjedi. Identification of hearing disorder by multi-band entropy cepstrum extraction from infant’s cry. In Biomedical and Pharmaceutical Engineering, 2009. ICBPE’09. International Conference on, 1–5. IEEE (2009).

F. Jabloun; A.E. Cetin; E. Erzin. Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Processing Letters 6:259–261 (1999).

R.S. Holambe; M.S. Deshpande. Noise robust speaker identification: using nonlinear modeling techniques. In Forensic Speaker Recognition - Law Enforcement and Counter-Terrorism, 153–182. Springer (2012).

B. Gajic; K.K. Paliwal. Robust parameters for speech recognition based on subband spectral centroid histograms. In Seventh European Conference on Speech Communication and Technology (2001).

J. Kacur; M. Varga; G. Rozinaj. ZCPA features for speech recognition. In Telecommunications (BIHTEL), 2012 IX International Symposium on, 1–4. IEEE (2012).

D.A. Reynolds; T.F. Quatieri; R.B. Dunn. Speaker verification using adapted gaussian mixture models. Digital signal processing 10:19–41 (2000).

R.O. Duda; P.E. Hart; D.G. Stork. Pattern Classification. John Wiley & Sons (2012), 53,136.

C. Petri. Relação entre níveis de significância bayesiano e freqüentista: e-value e p-value em tabelas de contingência. Dissertação de Mestrado, Universidade de São Paulo (2007).

J.M. Stern. Significance tests, belief calculi, and burden of proof in legal and scientific discourse. Advances in Intelligent Systems and Robotics: LAPTEC 2003 101:139 (2003).

G. ITU. Gsm full rate speech transcoding. GSM rec 06.10 (1991).

I. Guyon; S. Gunn; M. Nikravesh; L.A. Zadeh. Feature Extraction: Foundations and Applications, volume 207. Springer (2008), 100.

J. Gonzalez-Rodriguez; P. Rose; D. Ramos; D.T. Toledano; J. Ortega-Garcia. Emulating dna: Rigorous quantification of evidential weight in transparent and testable forensic speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing 15:2104–2115 (2007).

G. Casella; R. Berger. Inferência Estatística - Tradução da 2 a edição norteamericana. Centage Learning (2011), (p. 259).




DOI: http://dx.doi.org/10.15260/rbc.v8i2.344

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.