Data Saturation Reliability Theory: A Framework for Optimising AI Input Feeds
DOI:
https://doi.org/10.55578/isgm.2509.006Keywords:
Artificial Intelligence, Data Saturation, Reliability, Input Feeds, Signal-to-Noise Ratio, Feedback Mechanisms, AI Governance, Data Quality, Machine Learning, OptimisationAbstract
Artificial Intelligence (AI) systems increasingly rely on large and diverse data streams to support accurate, adaptive, and context-aware decision-making. However, beyond a certain point, adding new data can lead to diminishing or even negative returns due to redundancy, noise, and bias, a phenomenon known as data saturation. This paper introduces the Data Saturation Reliability (DSR) framework, a conceptual framework that optimises AI input feeds by balancing data volume, quality, and reliability. Drawing on principles from information theory, machine learning, and data governance, the DSR framework formalises saturation thresholds, signal-to-noise ratio assessment, temporal relevance, and dynamic feedback mechanisms as key factors for sustainable AI performance. By linking marginal information gain to input reliability, the DSR framework provides strategies to mitigate risks of over-saturation, bias propagation, and operational inefficiencies, while improving predictive accuracy and adaptive learning. The framework prioritises quality over quantity, encouraging intelligent curation of inputs rather than indiscriminate data collection. Applications include high-stakes fields such as healthcare diagnostics, financial forecasting, autonomous systems, and large-scale natural language processing, where real-time decision accuracy and reliability are vital. The paper highlights opportunities for empirical validation, cross-domain adaptation, and integration of DSR principles into AI lifecycle management and governance. Ultimately, the framework promotes shifting from “more data equals better performance” towards an optimal data balance that ensures operational effectiveness and ethical responsibility in AI deployment.
References
1. Halevy, A., Norvig, P., & Pereira, F. (2009). The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2), 8–12. https://doi.org/10.1109/MIS.2009.36
2. Kaplan, J., McCandlish, S., Henighan, T., Brown, T., et al. (2020). Scaling laws for neural language models. arXiv. https://arxiv.org/abs/2001.08361
3. Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., et al. (2022). Training compute-optimal large language models. arXiv. https://arxiv.org/abs/2203.15556
4. Atkinson, G., & Metsis, V. (2021). A survey of methods for detection and correction of noisy labels in time series data. In A. F. Gelbukh (Ed.), Artificial intelligence applications and innovations (IFIP Advances in Information and Communication Technology, Vol. 625, pp. 447–458). Springer. https://doi.org/10.1007/978-3-030-79150-6_38
5. González-Santoyo, C., Renza, D., & Moya-Albor, E. (2025). Identifying and mitigating label noise in deep learning for image classification. Technologies, 13(4), 132. https://doi.org/10.3390/technologies13040132
6. Nigam, N., Dutta, T., & Gupta, H. P. (2020). Impact of noisy labels in learning techniques: A survey. In T. Dutta (Ed.), Advances in data and information sciences (pp. 447–458). Springer. https://doi.org/10.1007/978-981-15-0694-9_38
7. Northcutt, C. G., Jiang, L., & Chuang, I. L. (2021). Confident learning: Estimating uncertainty in dataset labels. arXiv. https://arxiv.org/abs/1911.00068
8. Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.-F., & Dennison, D. (2015). Hidden technical debt in machine learning systems. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Proceedings of the 29th International Conference on Neural Information Processing Systems (Vol. 2, pp. 2503–2511). Curran Associates, Inc.
9. Sambasivan, N., Kapania, S., Highfill, H., Akrong, D., Paritosh, P., & Aroyo, L. M. (2021). Everyone wants to do the model work, not the data work: Data cascades in high-stakes AI. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1–15). https://doi.org/10.1145/3411764.3445518
10. Banerjee, A., & Compte, O. (2024). Consensus and disagreement: Information aggregation under (not so) naive learning. Journal of Political Economy, 132(4). https://doi.org/10.1086/729448
11. Brendel, P., Torres, A., & Arah, O. A. (2023). Simultaneous adjustment of uncontrolled confounding, selection bias and misclassification in multiple-bias modelling. International Journal of Epidemiology, 52(4), 1220–1230. https://doi.org/10.1093/ije/dyad001
12. Jarrahi, M. H., Memariani, A., & Guha, S. (2023). The principles of data-centric AI. Communications of the ACM, 66(8), 84–92. https://doi.org/10.1145/3571724
13. Settles, B. (2009). Active learning literature survey (Technical Report TR1648). University of Wisconsin–Madison, Department of Computer Sciences. http://digital.library.wisc.edu/1793/60660
14. Saunders, B., Sim, J., Kingstone, T., Baker, S., Waterfield, J., Bartlam, B., Burroughs, H., & Jinks, C. (2017). Saturation in qualitative research: Exploring its conceptualization and operationalization. Quality & Quantity, 52(4), 1893–1907. https://doi.org/10.1007/s11135-017-0574-8
15. Speed, C., & Metwally, A. A. (2025). The Human–AI Hybrid Delphi Model: A structured framework for context-rich, expert consensus in complex domains. arXiv. https://arxiv.org/abs/2508.09349
16. Braun, V., & Clarke, V. (2019). To saturate or not to saturate? Questioning data saturation as a useful concept for thematic analysis and sample-size rationales. Qualitative Research in Sport, Exercise and Health, 13(1), 1–16. https://doi.org/10.1080/2159676X.2019.1704846
17. Guest, G., Bunce, A., & Johnson, L. (2006). How many interviews are enough? Field Methods, 18(1), 59–82. https://doi.org/10.1177/1525822X05279903
18. Aldoseri, A., Al-Khalifa, K. N., & Hamouda, A. M. (2023). Re-thinking data strategy and integration for artificial intelligence: Concepts, opportunities, and challenges. Applied Sciences, 13(12), 7082. https://doi.org/10.3390/app13127082
19. Qin, Z., Zhaopan, X., Zhou, Y., & You, Y. (2024). Dataset growth. arXiv. https://doi.org/10.48550/arXiv.2405.18347
20. Okeleke, P. A., Ajiga, D., Folorunsho, S., & Ezeigweneme, C. (2024). Predictive analytics for market trends using AI: A study in consumer behavior. International Journal of Engineering Research Updates, 7(1). https://doi.org/10.53430/ijeru.2024.7.1.0032
21. Ali, O., Murray, P. A., Momin, M., Dwivedi, Y. K., & Malik, T. (2024). The effects of artificial intelligence applications in educational settings: Challenges and strategies. Technological Forecasting and Social Change, 199, 123076. https://doi.org/10.1016/j.techfore.2023.123076
22. National Academy of Medicine. (2023). Artificial intelligence in health care: The hope, the hype, the promise, the peril (D. Whicher, M. Ahmed, S. T. Israni, & M. Matheny, Eds.). National Academies Press. https://www.ncbi.nlm.nih.gov/pubmed/39146448
23. Nivedhaa, N. (2024). A comprehensive review of AI's dependence on data. ResearchGate. https://doi.org/10.13140/RG.2.2.27033.63840
24. Barddal, J. P., Enembreck, F., Gomes, H. M., & Pfahringer, B. (2018). Merit-guided dynamic feature selection filter for data streams. Expert Systems with Applications, 116, 311–326. https://doi.org/10.1016/j.eswa.2018.09.031
25. Bellavista, P., Berrocal, J., Corradi, A., Das, S. K., Foschini, L., & Zanni, A. (2019). A survey on fog computing for the Internet of Things. Pervasive and Mobile Computing, 52, 71-99. https://doi.org/10.1016/j.pmcj.2018.12.007
26. Mortaji, S. T. H., & Sadeghi, M. E. (2024). Assessing the reliability of artificial intelligence systems: Challenges, metrics, and future directions. International Journal of Innovation in Management Economics and Social Sciences, 4(2), 1–13. https://doi.org/10.59615/ijimes.4.2.1
27. Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
28. Kedziora, D., & Marciniak, P. (2025). Design principles for AI-enhanced process automation: An eDSR approach to intelligent data validation in financial decisions [Preprint]. Research Square. https://doi.org/10.21203/rs.3.rs-7206919/v1
29. Budnikov, M., Bykova, A., & Yamshchikov, I. P. (2025). Generalization potential of large language models. Neural Computing and Applications, 37, 1973–1997. https://doi.org/10.1007/s00521-024-10827-6
30. Ajiboye, A., Arshah, R. A., & Qin, H. (2015). Evaluating the effect of dataset size on predictive models using supervised learning techniques. International Journal of Computer Systems & Software Engineering, 1, 75–84. https://doi.org/10.15282/ijsecs.1.2015.6.0006
31. Scannapieco, M., & Catarci, T. (2002, May). Data quality under the computer science perspective. Rome, Italy.
32. Tamm, H. C., & Nikiforova, A. (2025). From data quality for AI to AI for data quality: A systematic review of tools for AI-augmented data quality management in data warehouses. In International Conference on Business Informatics Research. Springer International Publishing.
33. Heringa, M. B., Cnubben, N. H. P., Slob, W., & Hakkert, B. C. (2020). Use of the kinetically-derived maximum dose concept in selection of top doses for toxicity studies hampers proper hazard assessment and risk management. Regulatory Toxicology and Pharmacology, 114, 104659. https://doi.org/10.1016/j.yrtph.2020.104659
34. Mishra, S., Rao, A., Krishnan, R., & Zio, E. (2024). Reliability, resilience and human factors engineering for trustworthy AI systems [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2411.08981
35. Cho, J.-H., Xu, S., Hurley, P., & Beaumont, M. (2019). STRAM: Measuring the trustworthiness of computer-based systems. ACM Computing Surveys, 51(6), 1–47. https://doi.org/10.1145/3277666
36. Smith, J., & Ethan, A. (2019). Intelligent user feedback loops: AI for continuous product innovation. International Journal of Advanced Engineering Technologies and Innovations, 1(5), 21–XX.
37. Tadi, V. (2020). Optimizing data governance: Enhancing quality through AI-integrated master data management across industries. American Journal of Engineering Research, 1(3), 1–15.
38. Novelli, C., Taddeo, M., & Floridi, L. (2023). Accountability in artificial intelligence: What it is and how it works. AI & Society, 39(4), 1–12. https://doi.org/10.1007/s00146-023-01635-y
39. Gregor, S. (2006). The nature of theory in information systems. MIS Quarterly, 30(3), 611–642. https://doi.org/10.2307/25148742
40. Naeem, M., Ozuem, W., Howell, K., & Ranfagni, S. (2023). A step-by-step process of thematic analysis to develop a conceptual model in qualitative research. International Journal of Qualitative Methods, 22(11). https://doi.org/10.1177/16094069231205789
41. Arshed, N., Rehman, H. U., Nazim, M., & Saher, A. (2021). Evading law of diminishing returns, a case of human capital development. Journal of Contemporary Issues in Business and Government, 27(5), 2569–2584. https://doi.org/10.47750/cibg.2021.27.05.133
42. Wellmann, F. (2013). Information theory for correlation analysis and estimation of uncertainty reduction in maps and models. Entropy, 15(4), 1464–1485. https://doi.org/10.3390/e15041464
43. Jensen, G., Ward, R. D., & Balsam, P. D. (2013). Information: Theory, brain, and behavior. Journal of the Experimental Analysis of Behavior, 100(3), 408–431. https://doi.org/10.1002/jeab.49
Downloads
Published
Data Availability Statement
This study exclusively used data obtained from secondary sources through a comprehensive literature review. All referenced data are publicly accessible and have been appropriately cited within the manuscript.
Issue
Section
License
Copyright (c) 2025 Michael Mncedisi Willie, Siyabonga Jikwana, Lesiba Arnold Malotana, Zwanaka James Mudara (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.