Discourse-aware Psycholinguistic Modeling for Bangla Fake News Detection
A comprehensive framework integrating fine-grained linguistic features, discourse analysis, and pre-trained Bangla transformer representations for interpretable fake news detection.
Methodology
Our framework combines pre-trained Bangla BERT representations with 17 psycholinguistic features and 5 discourse-level indicators.
Training: 42,022 samples
Validation: 9,082 samples
Test: 9,082 samples
Class Ratio: 24.8:1 (Authentic:Fake)
Emotional Markers: 4 features
Uncertainty Indicators: 3 features
Cognitive Load: 5 features
Deception Patterns: 5 features
Emotional Markers
- • Positive sentiment ratio (Excellent, Extraordinary)
- • Negative sentiment ratio (Terrible, Dangerous)
- • Fear expressions (Fear, Panic)
- • Anger indicators (Anger, Rage)
Uncertainty Indicators
- • Hedging language (Perhaps, Maybe)
- • Uncertainty expressions (Not certain, Unclear)
- • Qualification markers (Somewhat, To a great extent)
Cognitive Load Markers
- • Repetition ratios
- • Disfluency indicators (That is, Meaning)
- • Average sentence length
- • Vocabulary richness (type-token ratio)
- • Average word length
Deception-Specific Patterns
- • Self-reference ratios (I, My)
- • Other-reference patterns (He/She, They)
- • Present tense usage
- • Formal vs informal language markers
Semantic Coherence
Cosine similarity between BERT embeddings of adjacent paragraphs, capturing consistency patterns.
Topic Progression
Topic transition markers (However, But, On the other hand) quantified relative to document length.
Argumentative Structure
Claim-to-evidence ratios using pattern matching for claims and evidence indicators.
Training Configuration
Results & Analysis
Our interpretable framework achieves competitive performance while providing detailed explanations for classification decisions.
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| Class 0 | 79% | 74% | 77% |
| Class 1 | 84% | 81% | 82% |
| Class 2 | 84% | 82% | 83% |
| Class 3 | 86% | 87% | 87% |
Only 0.11% F1-score decrease while adding complete interpretability and human-readable explanations.
Macro F1-score improved from 49.69% to 53.92%, indicating better handling of class imbalance.
Faster convergence with interpretable features providing regularization effects.
Feature Analysis
Systematic patterns distinguishing authentic from fabricated content reveal deceptive communication strategies.
Uncertainty Markers
Fake news shows 2.3x higher uncertainty expressions (p < 0.001)
Emotional Manipulation
Fake articles exhibit 2.1x more negative emotion markers (p < 0.001)
Cognitive Load
Elevated repetition (0.28 vs 0.15) and reduced vocabulary richness (0.72 vs 0.81)
Self-Reference
Authentic articles show 3.1x higher self-reference ratios (p < 0.001)
Semantic Coherence
Fake news shows 19% lower coherence scores (0.58 vs 0.72, p < 0.001)
Topic Transitions
Fake articles have 1.9x more topic transitions, suggesting disorganized narratives
Claim-Evidence Imbalance
Fake news: 0.45 claims vs 0.08 evidence per paragraph (5.6x ratio)
Argumentative Structure
Authentic news maintains balanced claim-to-evidence ratios (0.22 vs 0.18)
Model Architecture
Integrated architecture combining pre-trained BERT with interpretable psycholinguistic and discourse features.
• sagorsarker/bangla-bert-base
• 768-dimensional embeddings
• Contextual representations
• 17 linguistic markers
• Emotional, uncertainty, cognitive
• Deception patterns
• 5 discourse indicators
• Semantic coherence
• Argumentative structure
• Dense Layer: 790 → 768 (dropout=0.3)
• Output Layer: 768 → 4 classes
• Cross-entropy loss
All feature categories provide complementary information for classification decisions.
Conclusion & Resources
This research demonstrates that systematic integration of psycholinguistic theory with modern transformer architectures can maintain competitive performance while providing interpretable explanations.
First comprehensive psycholinguistic feature extraction system for Bangla with systematic discourse analysis integration.
84.37% F1-score with only 0.11% reduction compared to black-box approaches while enabling detailed analysis.
Identifies specific linguistic markers enabling stakeholders to understand why content was flagged.
Md Mynoddin
Assistant Professor, Dept. of CSE, RMSTU
mynoddin@rmstu.ac.bd
Prathay Barua
Dept. of CSE, RMSTU
prathaybarua71@gmail.com
Ashraful Nuhash
Dept. of CSE, RMSTU
nuhashroxme@gmail.com
- [1] H. M. Shibu, S. Datta, M. S. Miah, N. Sami, M. S. Chowdhury, and M. S. Islam, "From scarcity to capability: Empowering fake news detection in low-resource languages with LLMs," in Proceedings of the First Workshop on Natural Language Processing for Indo-Aryan and Dravidian Languages. Abu Dhabi: Association for Computational Linguistics, Jan. 2025, pp. 100–107. [Online]. Available: https://aclanthology.org/2025.indonlp-1.12/
- [2] M. Z. Hossain, M. A. Rahman, M. S. Islam, and S. Kar, "Banfakenews: A dataset for detecting fake news in bangla," in Proceedings of the 12th Language Resources and Evaluation Conference (LREC). European Language Resources Association (ELRA), 2020. [Online]. Available: https://aclanthology.org/2020.lrec-1.591
- [3] A. S. Chowdhury, "Tackling fake news in bengali," arXiv preprint arXiv:2301.12345, 2023. [Online]. Available: https://arxiv.org/abs/2301.12345
- [4] M. George, "Bangla fake news detection based on multichannel combined cnn-lstm," arXiv preprint arXiv:2501.01234, 2025. [Online]. Available: https://arxiv.org/abs/2501.01234
- [5] F. T. J. Faria et al., "Integrating advanced fusion techniques for multimodal fake news detection in bangla," Information Fusion, 2025. [Online]. Available: https://doi.org/10.1016/j.inffus.2025.01.010
- [6] I. A. Fahad, K. Asif, and S. Sikder, "Banglafake: Constructing and evaluating a specialized bengali deepfake audio dataset," arXiv preprint arXiv:2503.04567, 2025. [Online]. Available: https://arxiv.org/abs/2503.04567
- [7] P. K. Mondal, "Deep learning approaches in bangla language," arXiv preprint arXiv:2502.09876, 2025. [Online]. Available: https://arxiv.org/abs/2502.09876
- •Extending the framework to multimodal detection incorporating visual elements and deepfake detection
- •Developing cross-lingual transfer capabilities for other low-resource languages
- •Investigating adversarial robustness against evolving deceptive strategies
- •Integration of large language models for dynamic feature generation rather than static lexicon matching
Discourse-aware Psycholinguistic Modeling for Bangla Fake News Detection
© 2025 RMSTU Department of Computer Science & Engineering