Publications
Publications by categories in reversed chronological order.
2025
- arXivLenslessMic: Audio Encryption and Authentication via Lensless Computational ImagingPetr Grinberg, Eric Bezzam, Paolo Prandoni, and 1 more authorarXiv preprint arXiv:2509.16418, 2025
With society’s increasing reliance on digital data sharing, the protection of sensitive information has become critical. Encryption serves as one of the privacy-preserving methods; however, its realization in the audio domain predominantly relies on signal processing or software methods embedded into hardware. In this paper, we introduce LenslessMic, a hybrid optical hardware-based encryption method that utilizes a lensless camera as a physical layer of security applicable to multiple types of audio. We show that LenslessMic enables (1) robust authentication of audio recordings and (2) encryption strength that can rival the search space of 256-bit digital standards, while maintaining high-quality signals and minimal loss of content information. The approach is validated with a low-cost Raspberry Pi prototype and is open-sourced together with datasets to facilitate research in the area.
@article{grinberg2025lenslessmic, title = {LenslessMic: Audio Encryption and Authentication via Lensless Computational Imaging}, author = {Grinberg, Petr and Bezzam, Eric and Prandoni, Paolo and Vetterli, Martin}, journal = {arXiv preprint arXiv:2509.16418}, year = {2025}, } - InterspeechA Data-Driven Diffusion-based Approach for Audio Deepfake ExplanationsPetr Grinberg, Ankur Kumar, Surya Koppisetti, and 1 more authorIn Interspeech 2025, 2025
Evaluating explainability techniques, such as SHAP and LRP, in the context of audio deepfake detection is challenging due to lack of clear ground truth annotations. In the cases when we are able to obtain the ground truth, we find that these methods struggle to provide accurate explanations. In this work, we propose a novel data-driven approach to identify artifact regions in deepfake audio. We consider paired real and vocoded audio, and use the difference in time-frequency representation as the ground-truth explanation. The difference signal then serves as a supervision to train a diffusion model to expose the deepfake artifacts in a given vocoded audio. Experimental results on the VocV4 and LibriSeVoc datasets demonstrate that our method outperforms traditional explainability techniques, both qualitatively and quantitatively.
@inproceedings{grinberg2025data, title = {{A Data-Driven Diffusion-based Approach for Audio Deepfake Explanations}}, author = {Grinberg, Petr and Kumar, Ankur and Koppisetti, Surya and Bharaj, Gaurav}, year = {2025}, booktitle = {{Interspeech 2025}}, pages = {5348--5352}, doi = {10.21437/Interspeech.2025-2105}, issn = {2958-1796}, } - ICASSPWhat Does an Audio Deepfake Detector Focus on? A Study in the Time DomainPetr Grinberg, Ankur Kumar, Surya Koppisetti, and 1 more authorIn ICASSP 2025 (Accepted) IEEE international conference on acoustics, speech and signal processing (ICASSP), 2025
Adding explanations to audio deepfake detection (ADD) models will boost their real-world application by providing insight on the decision making process. In this paper, we propose a relevancy-based explainable AI (XAI) method to analyze the predictions of transformer-based ADD models. We compare against standard Grad-CAM and SHAP-based methods, using quantitative faithfulness metrics as well as a partial spoof test, to comprehensively analyze the relative importance of different temporal regions in an audio. We consider large datasets, unlike previous works where only limited utterances are studied, and find that the XAI methods differ in their explanations. The proposed relevancy-based XAI method performs the best overall on a variety of metrics. Further investigation on the relative importance of speech/non-speech, phonetic content, and voice onsets/offsets suggest that the XAI results obtained from analyzing limited utterances don’t necessarily hold when evaluated on large datasets.
@inproceedings{grinberg2025does, title = {{What Does an Audio Deepfake Detector Focus on? A Study in the Time Domain}}, author = {Grinberg, Petr and Kumar, Ankur and Koppisetti, Surya and Bharaj, Gaurav}, booktitle = {ICASSP 2025 (Accepted) IEEE international conference on acoustics, speech and signal processing (ICASSP)}, year = {2025}, organization = {IEEE}, }
2023
- IEEE AccessRawSpectrogram: On the Way to Effective Streaming Speech Anti-spoofingPetr Grinberg, and Vladislav ShikhovIEEE Access, 2023
Traditional anti-spoofing systems cannot be used straightforwardly with streaming audio because they are designed for finite utterances. Such offline models can be applied in streaming with the help of buffering; however, they are not effective in terms of memory and computational consumption. We propose a novel approach called RawSpectrogram that makes offline models streaming-friendly without a significant drop in quality. The method was tested on RawNet2 and AASIST, resulting in new architectures called RawRNN (RawLSTM and RawGRU), RS-AASIST, and TAASIST. The RawRNN-type models are much smaller and achieve a better Equal Error Rate than their base architecture, RawNet2. RS-AASIST and TAASIST have fewer parameters than AASIST and achieve similar quality. We also proved our concept for models with time-frequency transform front-ends and automatic speaker verification systems by proposing RECAPA-TDNN based on ECAPA-TDNN. RS-AASIST and RECAPA-TDNN were combined into the first streaming-friendly spoofing-aware speaker verification system reported in the literature. This joint system achieves significantly better quality than the corresponding offline solutions. All our models require far fewer floating-point operations for score updates. RawSpectrogram usage significantly reduces the latency of the prediction and allows the system to update the probability with each new chunk from the stream, preserving all information from the past. To the best of our knowledge, TAASIST is the most successful voice anti-spoofing system that employs a vanilla Transformer trained using supervised learning.
@article{grinberg2023rawspectrogram, title = {{RawSpectrogram: On the Way to Effective Streaming Speech Anti-spoofing}}, author = {Grinberg, Petr and Shikhov, Vladislav}, journal = {IEEE Access}, year = {2023}, publisher = {IEEE}, doi = {10.1109/ACCESS.2023.3321919}, }
2022
- arXivA Comparative Study of Fusion Methods for SASV Challenge 2022Petr Grinberg, and Vladislav ShikhovarXiv preprint arXiv:2203.16970, 2022
Automatic Speaker Verification (ASV) system is a type of bio-metric authentication. It can be attacked by an intruder, who falsifies data in order to get access to protected information. Countermeasures (CM) are special algorithms that detect these spoofing-attacks. While the ASVspoof Challenge series were focused on the development of CM for fixed ASV system, the new Spoofing Aware Speaker Verification (SASV) Challenge organizers believe that best results can be achieved if CM and ASV systems are optimized jointly. One of the approaches for cooperative optimization is a fusion over embeddings or scores obtained from ASV and CM models. The baselines of SASV Challenge 2022 present two types of fusion: score-sum and back-end ensemble with a 3-layer MLP. This paper describes our research of other fusion methods, including boosting over embeddings, which has not been used in anti-spoofing studies before.
@article{grinberg2022comparative, title = {{A Comparative Study of Fusion Methods for SASV Challenge 2022}}, author = {Grinberg, Petr and Shikhov, Vladislav}, journal = {arXiv preprint arXiv:2203.16970}, year = {2022}, doi = {10.48550/arXiv.2203.16970}, }