I’m a CTO at John Snow Labs, helping healthcare & life science companies put AI to good use. My interests include natural language processing, applied artificial intelligence in healthcare, and responsible AI.


NLP in Healthcare

Natural Language Processsing

Responsible AI

Data Driven Healthcare

  • Accelerating the adoption of precision oncology with artificial intelligence (AI). With Elise Berliner, Lulu Lee, and Vishakha Sharma, at the Real-World Evidence Industry Summit of Experts [R.I.S.E.], May 2023
  • Real-World Lessons from Applying Natural Language Processing to Personalized Healthcare. O’Reilly AI Superstream, March 2021
  • How SelectData uses AI to better understand home health patients.
    With Alberto Andreotti, Stacy Ashworth and Tawny Nichols. Strata Data Conference, New York, September 2018.
  • Improving patient flow forecasting at Kaiser Permanente.
    With Santosh Kulkarni. Strata Data Conference, in San Jose, California, March 2018.
  • Clinical natural language understanding at scale.
    EU Data Science Summit, in Tel Aviv, Israel, June 2016.
  • Moving Beyond Templates and Coercion to Improve Physician Documentation.
    With Jill Wolf, at the 23rd Annual WEDI National Conference, in Hollywood, CA, May 2014.
  • Data driven approach to revenue capture process improvement.
    With Gene Boerger, at the AHIMA 85th Convention, in Atlanta, GA, October 2013.
  • Data driven models to minimize hospital readmissions.
    With Miriam Paramore, at the 2013 Strata Rx Conference, in Boston, MA, September 2013.

Data Science in Production

AI Platform Architecture

  • How to build an open source data science platform.
    Half-day tutorial at TDWI Anaheim, California, August 2018.
  • Building a new predictive model & API in 30 minutes.
    With Claudiu Barbura, at PAPIs.io — The Predictive APIs and Apps Conference, in Barcelona, Spain, November 2014.
  • Building an intelligent big data app in 30 minutes.
    With Claudiu Barbura, at the Strata Barcelona Conference, in Barcelona, Spain, November 2014.
  • Lessons Learned from Embedding Cassandra in an enterprise-grade big data platform.
    With Claudiu Barbura, at Cassandra Day Seattle, in Bellevue, WA, July 2014.
  • Leveraging a big data infrastructure to accelerate the data science workflow.
    At the 5th Timisoara Big Data Meetup, in Timisoara, Romania, June 2014.

Machine Learning for Fraud Detection

  • Hunting Criminals with Hybrid Analytics.
    At Data by the Bay, San Francisco, May 2017; Global Data Science Conference 2016, in Santa Clara, CA, March 2016; and IBM Datapalooza, in Seattle, WA, February 2016.
  • Architecting a predictive, petabyte-scale, self-learning fraud detection system.
    Global Predictive Analytics Conference, Santa Clara, CA, March 2017.
  • Online Predictive Modeling of Fraud Schemes from Multiple Live Streams.
    With Claudiu Branzan, at Spark Summit East, in New York, NY, February 2016.
  • Online fraud detection: A reference architecture for adversarial learning.
    At MLConf Atlanta, in Atlanta, GA, September 2015.
  • Hunting Criminals with Hybrid Analytics, Semi-supervised Learning & Agent Feedback.
    With Claudiu Branzan, At the Smart Data Conference, in San Jose, CA, August 2015 & at at Strata + Hadoop World, in London, UK, May 2015.
  • Active learning from streams of graph, language & time series signals.
    With Claudiu Branzan, At the Data Science Summit & Dato Conference, in San Francisco, CA, July 2015.


Natural Language Processing in Healthcare

Parallel Computer Scheduling & Workload Modeling

Agile Software Development

Model Driven Software Engineering



Data Science

Software Engineering


Spark NLP

John Snow Labs' Spark NLP is an open source text processing library for Python, Java, and Scala. It provides production-grade, scalable, and trainable versions of the latest research in natural language processing.

Healthcare NLP

Spark NLP for Healthcare is a commercial extension of Spark NLP for clinical and biomedical text mining. It provides healthcare-specific annotators, pipelines, models, and embeddings for clinical entity recognition, clinical entity linking, entity normalization, assertion status detection, de-identification, relation extraction, and spell checking and correction. It also includes over 4000 pre-trained healthcare models, that can recognize the entities such as clinical, drugs, risk factors, anatomy, demographics, and sensitive data.

Visual NLP

Visual NLP is another commercial extension of Spark NLP for optical character recognition (OCR) from images, scanned PDF documents, and DICOM files.[7] It is a software library built on top of Apache Spark. It provides several image pre-processing features for improving text recognition results such as adaptive thresholding and denoising, skew detection & correction, adaptive scaling, layout analysis and region detection, image cropping, removing background objects.

NLP Test

The open-source nlptest Python library enables data scientists to generate tests, run tests, and augment data in order to create language models that are more robust, fair, unbiased, representative, and accurate. It supports the testing of multiple popular NLP frameworks and tasks, including large language models.


The Annotation Lab is a robust data labeling and AI/ML solution for the cloud. It enables customers to annotate their data and generate their models in a simple, fast and efficient project based workflow. The Annotation Lab is a robust data labeling and AI/ML solution for the cloud.

Visual Co-Plot

A statistical analysis tool, tailored for datasets with few observations and many variables which may be intercorrelated. Co-Plot enables visually analysing observations, variables and the correlations between them together.


John Snow Labs

John Snow Labs, an AI and NLP for healthcare company, provides state-of-the-art software, models, and data to help healthcare and life science organizations build, deploy, and operate AI projects.

Pacific AI

Pacific AI provides consulting CTO services for high-growth software companies, specializing in applying AI, big data and data science. I conduct technology due diligence, board reviews and AI strategy & architecture workshops.

Coalition for Health AI

The Coalition for Health AI is a community of academic health systems, organizations, and AI expert practitioners. Its mission is to provide guidelines regarding an ever-evolving landscape of health AI tools to ensure high quality care, increase credibility amongst users, and meet health care needs. I lead the fairness, equity, and bias mitigation workgroup at CHAI.

Forbes Technology Council

I'm a member of the Forbes Technology Council, an invitation-only community for CIOs, CTOs and technology executives. It's a curated network of successful peers, provides access to a variety of benefits and resources, and includes the opportunity to submit thought leadership articles and short tips on industry-related topics for publishing on Forbes.com.

Data Science in Healthcare

I started and run the Linkedin group for data science in healthcare. Doing my best to keep an open, evidence based and vendor neutral discussion, focused on recent research results and field case studies.