Skip to content


Tackling patient non-compliance for mammography screenings using a robust Data Warehouse


Business Objective

Building a secure Data Warehouse for tackling the specific challenge of patient non-compliance for mammography screenings


Building a secure data warehouse for a holistic view of patient data


Enhanced Patient Monitoring and Data-Driven Insights


Patient risk stratification, identifying high-risk patients for non-compliance


Enabling targeted interventions for non-compliant patients

Why build a Secure Healthcare Data Warehouse?

A healthcare data warehouse is a centralized repository that integrates, structures, stores, and processes PHI and clinical data from disparate sources for analytical querying and reporting. As the volume and complexity of critical data, including PHI, clinical data, supply, distribution, and operational data escalates, the significance of data warehousing in healthcare has never been more important. Timely collection &storage of data and retrieval of valuable insights is crucial in facilitating optimal healthcaredelivery and improving patient health outcomes. Healthcare is a highly data-driven industry. A DWH can help improve clinical outcomes, optimize staff management and procurement, and reduce operating costs. Unlike other industries, healthcare still lags in data maturity and being data-driven. The key reason is a lack of centralized data sources.

Approach to building a Data Warehouse and tackling patient non-compliance

Data Source Layer

  • Healthcare data, primarily PHI (protected health information) and non-PHI data were sourced from several decentralized and diverse sources such as patient portals, mammograms, radiologist reports, clinical notes etc.

Data Integration and ETL

  • As a first step, we at Finarb have done Data Modeling. Data types that needed to be captured have been identified, and a dimensional data model comprising several dimension tables, such as root-level PHI data and high-level data related to the frequency of mammography screenings, have been built.
  • Then, we have extracted and ingested structured and unstructured healthcare data from disparate source databases, maintaining stringent data security measures.
  • After that, we have ensured control over healthcare data loading/management.
  • Maintaining Data Integrity – We have conducted stringent data validation checks, resolving data anomalies and inconsistencies and addressing data quality issues.

Data Security and Regulatory Compliance

  • We have built the DW on HIPAA-compliant Azure Data Factory and have hosted it on secure Azure Cloud.
  • For safeguarding data privacy, Finarb has established robust guidelines for protecting PHI data assets, such as PHI and non-PHI raw data encryption methods, at rest and before being transferred to Azure Data Factory.
  • We have carried out data transformation in compliance with healthcare regulations (HIPAA, FDA, Hl7 requirements).
  • The final DW has been hosted inside the client’s private VNet, giving complete control to the client over their data.

Data storage & Scalability

  • Finarb has stored the data on a centralized structured Azure SQL database in a ready-to-be-analyzed form. The DW has been made scalable to handle increased data volume and is hosted on Azure Cloud.
  • Future vital integrations to this DW have been made possible through microservices architecture and APIs.

Feature engineering and predictive modeling for risk stratification of patients

  • We have chosen the most important variables or features directly or indirectly impacting a patient's compliance.
  • Key identified features include actual screening procedure time, patient wait time, delay in procedure start, appointment timing, weekday/weekend examinations, number of patients handled by staff before the current patient, etc.
  • Finarb's predictive model has been trained using the feature-engineered dataset to accurately predict patients' likelihood of compliance. This has enabled identifying high-risk patients who were more likely to be non-compliant.

Identification of Compliance Drivers & Recommended Interventions

  • We have further clustered high-risk patients into sub-segments based on their top drivers of non-compliance. For each such sub-segment, we have trained the model to find common denominator driver(s), which could be used to customize interventions for improving patient compliance.

Critical Success Factors

  • Effective consolidation of data from multiple structures and unstructured data sources and healthcare IT systems
  • Building a reliable ETL pipeline
  • Compliance with healthcare data standards such as HIPAA, HL7 and ensuring compatibility when exporting data.
  • Following Data interoperability guidelines and healthcare regulations during data transformation
  • Accurate feature engineering for ML model training
  • Accurate identification of non-compliance drivers per patient for custom interventions

Functional Architecture of a Data Warehouse in Healthcare

Results and Benefits of a Data Warehouse towards improving patient compliance

  • Improved Compliance: Achieved 37% higher patient compliance rates through AI/ML techniques and personalized outreach, leading to timely and frequent early detection and treatment screenings.
  • Enhanced Efficiency: Streamlined processes and improved resource utilization by leveraging patient compliance monitoring, allowing staff to focus on core tasks instead of repetitive ones.
  • Targeted Interventions: Implemented targeted outreach programs for high-risk patients, optimizing resource allocation and improving overall efficiency.
  • Granular Business Insights: Consolidated disparate data sources, enabling quick access and valuable insights for data-driven decision-making.
  • Enhanced Patient Experience: Integrated data driven personalized services, intelligent recommendations, and improved satisfaction


If you would like to know more or discuss our use cases in detail