Article section
Authorship Attribution of Arabic Criminal Texts Using Large Language Models: A Comparative Evaluation of ChatGPT, DeepSeek, and Gemini
Abstract
This study investigates the ability of three large language models (LLMs)—ChatGPT, DeepSeek, and Gemini—to attribute authorship of Arabic criminal texts in a zero-shot setting, with no task-specific training or fine-tuning. Using a quantitative experimental design, each model attributes 24 anonymous criminal texts against reference writings from 12 Arabic authors. The results reveal limited effectiveness, with only ChatGPT achieving a statistically significant accuracy rate of 25%, above the 8.3% chance level. These findings demonstrate that current LLMs in zero-shot settings lack sufficient reliability for definitive authorship attribution (AA) of short Arabic criminal texts, highlighting a gap between their general linguistic capabilities and the specific requirements of forensic textual analysis. While LLMs show preliminary potential, their current implementation cannot replace human expertise in high-stakes forensic contexts involving Arabic texts.
Keywords:
Authorship attribution, forensic linguistics, Arabic criminal texts, LLMs
Article information
Journal
International Journal of Human Post-Edited AI Qualitative Data Analysis
Volume (Issue)
2(1), (2026)
Pages
1-13
Published
Copyright
Copyright (c) 2026 Ibrahim Alharbi (Author)
Open access

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
References
Alsajri, A., Salman, H. A., & Steiti, A. (2024). Generative models in natural language processing: A comparative study of ChatGPT and Gemini. Babylonian Journal of Artificial Intelligence, 134–145.
Altheneyan, A., & Menai, M. (2014). Naïve Bayes classifiers for authorship attribution of Arabic texts. Journal of King Saud University – Computer and Information Sciences, 26(4), 473–484. https://doi.org/10.1016/j.jksuci.2014.06.006
AlZahrani, F. M., & Al-Yahya, M. (2023). A transformer-based approach to authorship attribution in classical Arabic texts. Applied Sciences, 13(12), 1–15. https://doi.org/10.3390/app13127255
Atkinson-Abutridy, J. (2024). Large language models (1st ed.). CRC Press.
Bissell, A. F. (1995). Weighted cumulative sums for text analysis using word counts. Journal of the Royal Statistical Society: Series A (Statistics in Society), 158(3), 525–545. https://doi.org/10.2307/2983444
Canbay, P., Sezer, E. A., & Sever, H. (2020). Deep combination of stylometry features in forensic authorship analysis. International Journal of Information Security Science, 9(3), 154–163.
Coulthard, M., Johnson, A., & Wright, D. (2016). An introduction to forensic linguistics (2nd ed.). Routledge.
Coulthard, M., Johnson, A., & Wright, D. (2020). The Routledge handbook of forensic linguistics (2nd ed.). Routledge.
Coyotl-Morales, R. M., Villaseñor-Pineda, L., Montes-y-Gómez, M., & Rosso, P. (2006). Authorship attribution using word sequences. In J. F. Martínez-Trinidad, J. A. Carrasco Ochoa, & J. Kittler (Eds.), Progress in pattern recognition, image analysis and applications (pp. 844–853). Springer. https://doi.org/10.1007/11892755_87
Everett, D. L. (2012). Language. Profile Books.
Gee, J. P. (2017). Introducing discourse analysis (1st ed.). Routledge.
Grant, T. (2022). The idea of progress in forensic authorship analysis. Cambridge University Press.
Hardcastle, R. A. (1993). Forensic linguistics: An assessment of the CUSUM method for the determination of authorship. Journal of the Forensic Science Society, 33(2), 95–106.
Holmes, D. I. (1998). The evolution of stylometry in humanities scholarship. Literary and Linguistic Computing, 13(3), 111–117. https://doi.org/10.1093/llc/13.3.111
Holmes, D. I., & Forsyth, R. S. (1995). The Federalist revisited: New directions in authorship attribution. Literary and Linguistic Computing, 10(2), 111–127.
Seltman, H. J. (2018). Experimental design and analysis. Carnegie Mellon University.
Hu, Z., Zheng, T., & Huang, H. (2024). A Bayesian approach to harnessing the power of LLMs in authorship attribution. In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 13216–13227). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.emnlp-main.733
Huang, B., Chen, C., & Shu, K. (2024). Can large language models identify authorship? In Y. Al-Onaizan, M. Bansal, & Y. Chen (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2024 (pp. 445–460). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.findings-emnlp.26
Huang, B., Chen, C., & Shu, K. (2025). Authorship attribution in the era of LLMs: Problems, methodologies, and challenges. ACM SIGKDD Explorations Newsletter, 26(2), 21-43.
Makei, J., & Tokura, T. (2025). Teaching “what” vs. teaching “why”: How ChatGPT and generative AI are shaping education. ResearchGate. https://doi.org/10.13140/RG.2.2.13559.53924
Misini, A., Canhasi, E., Kadriu, A., & Fetahi, E. (2024). Automatic authorship attribution in Albanian texts. PLOS ONE, 19(10), e0310057. https://doi.org/10.1371/journal.pone.0310057
Mosteller, F., & Wallace, D. L. (1963). Inference in an authorship problem. Journal of the American Statistical Association, 58(302), 275–309. https://doi.org/10.1080/01621459.1963.10500849
Olsson, J. (2008). Forensic linguistics (2nd ed.). Continuum.
Olsson, J. (2009). Wordcrime (1st ed.). Continuum.
Olsson, J., & Luchjenbroers, J. (2013). Forensic linguistics (1st ed.). Bloomsbury Academic.
Plechác, P. (2022) Versification and Authorship Attribution. Karolinum Press, Charles University.
Raschka, S. (2024). Build a large language model (from scratch). Manning.
Rahman, M., Shiplu, A., Watanobe, Y., Tapader, M., Amin, M., & Peng, L. (2025). ChatGPT and DeepSeek: Strengths, limitations, and the future of generative AI. Journal of LATEX Class Files, 18(9), 1-19.
Saini, K., Gupta, A., Rani, S., Sethi, R., & Awasthi, P. (2024). Artificial intelligence in forensic science (1st ed.). CRC Press.
Sousa-Silva, R. (2024). Fighting cyber-malice: A forensic linguistics approach to detecting AI-generated malicious texts. In Proceedings of the 1st International Conference on NLP & AI for Cyber Security (164–174).
Gorovaia, S., Schmidt, G., & Yamshchikov, I. P. (2024). Sui generis: Large language models for authorship attribution and verification in Latin. In M. Hämäläinen, E. Öhman, S. Miyagawa, K. Alnajjar, & Y. Bizzoni (Eds.), Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities (pp. 398–412). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.nlp4dh-1.39
Thakur, K., Barker, H. & Pathan, A.-S. K. (2024). Artificial intelligence and large language models (1st ed.). Chapman and Hall/CRC.

Author's Guidelines
Manuscript Template
References Guideline