AUTOMATED DISEASE PREDICTION AND DIAGNOSIS FLOWCHART GENERATION FROM MEDICAL REPORTS USING NLP AND RANDOM FOREST

Author(s): V Raguvaran, Prof. Ebbie Selvakumar
Downloads: 31
Abstract

The digital healthcare age, quick and precise interpretation of patient information is paramount for proper diagnosis and treatment on time. This research work suggests an intelligent system for auto-disease prediction and auto-diagnosis flowchart generation from actual medical reports available in PDF form. The algorithm begins with text data extraction from clinical reports using sophisticated PDF parsing technology. Natural Language Processing (NLP) operations such as tokenization, lemmatization, Named Entity Recognition (NER), and keyword extraction (through TF-IDF or RAKE) are utilized to structure and process the extracted data. A Random Forest classifier is subsequently used to make predictions of the most likely disease based on detected symptoms and relevant features. Upon prediction, a corresponding predefined diagnostic flowchart is created using visualization libraries like Graph viz or Mermaid.js. The final output consisting of the predicted disease, medical terminology, and graphical diagnosis pathway is shown to the user for better comprehension and support in decision-making. This system provides an effective, explainable, and user-friendly means of supporting clinical evaluation and healthcare provision.