I’m a CTO at John Snow Labs, helping healthcare & life science companies put AI to good use. My interests include natural language processing, applied machine learning, and large scale distributed systems.

Talks

Natural Language Processsing

NLP in Healthcare

Data Science in Production

Data Driven Healthcare

  • CIO Visions UK Leadership Summit, April 2022.
  • O’Reilly AI Superstream, March 2021.
  • How SelectData uses AI to better understand home health patients.
    With Alberto Andreotti, Stacy Ashworth and Tawny Nichols. Strata Data Conference, New York, September 2018.
  • Improving patient flow forecasting at Kaiser Permanente.
    With Santosh Kulkarni. Strata Data Conference, in San Jose, California, March 2018.
  • Clinical natural language understanding at scale.
    EU Data Science Summit, in Tel Aviv, Israel, June 2016.
  • Moving Beyond Templates and Coercion to Improve Physician Documentation.
    With Jill Wolf, at the 23rd Annual WEDI National Conference, in Hollywood, CA, May 2014.
  • Data driven approach to revenue capture process improvement.
    With Gene Boerger, at the AHIMA 85th Convention, in Atlanta, GA, October 2013.
  • Data driven models to minimize hospital readmissions.
    With Miriam Paramore, at the 2013 Strata Rx Conference, in Boston, MA, September 2013.

AI Platform Architecture

  • How to build an open source data science platform.
    Half-day tutorial at TDWI Anaheim, California, August 2018.
  • Building a new predictive model & API in 30 minutes.
    With Claudiu Barbura, at PAPIs.io — The Predictive APIs and Apps Conference, in Barcelona, Spain, November 2014.
  • Building an intelligent big data app in 30 minutes.
    With Claudiu Barbura, at the Strata Barcelona Conference, in Barcelona, Spain, November 2014.
  • Lessons Learned from Embedding Cassandra in an enterprise-grade big data platform.
    With Claudiu Barbura, at Cassandra Day Seattle, in Bellevue, WA, July 2014.
  • Leveraging a big data infrastructure to accelerate the data science workflow.
    At the 5th Timisoara Big Data Meetup, in Timisoara, Romania, June 2014.

Machine Learning for Fraud Detection

  • Hunting Criminals with Hybrid Analytics.
    At Data by the Bay, San Francisco, May 2017; Global Data Science Conference 2016, in Santa Clara, CA, March 2016; and IBM Datapalooza, in Seattle, WA, February 2016.
  • Architecting a predictive, petabyte-scale, self-learning fraud detection system.
    Global Predictive Analytics Conference, Santa Clara, CA, March 2017.
  • Online Predictive Modeling of Fraud Schemes from Multiple Live Streams.
    With Claudiu Branzan, at Spark Summit East, in New York, NY, February 2016.
  • Online fraud detection: A reference architecture for adversarial learning.
    At MLConf Atlanta, in Atlanta, GA, September 2015.
  • Hunting Criminals with Hybrid Analytics, Semi-supervised Learning & Agent Feedback.
    With Claudiu Branzan, At the Smart Data Conference, in San Jose, CA, August 2015 & at at Strata + Hadoop World, in London, UK, May 2015.
  • Active learning from streams of graph, language & time series signals.
    With Claudiu Branzan, At the Data Science Summit & Dato Conference, in San Francisco, CA, July 2015.

Research

Natural Language Processing in Healthcare

Parallel Computer Scheduling & Workload Modeling

Agile Software Development

Model Driven Software Engineering

Patents

Healthcare

Data Science

Software Engineering

Software

Spark NLP

John Snow Labs' Spark NLP is an open source text processing library for Python, Java, and Scala. It provides production-grade, scalable, and trainable versions of the latest research in natural language processing.

Healthcare NLP

Spark NLP for Healthcare is a commercial extension of Spark NLP for clinical and biomedical text mining. It provides healthcare-specific annotators, pipelines, models, and embeddings for clinical entity recognition, clinical entity linking, entity normalization, assertion status detection, de-identification, relation extraction, and spell checking and correction. It also includes over 4000 pre-trained healthcare models, that can recognize the entities such as clinical, drugs, risk factors, anatomy, demographics, and sensitive data.

Spark OCR

Spark OCR is another commercial extension of Spark NLP for optical character recognition (OCR) from images, scanned PDF documents, and DICOM files.[7] It is a software library built on top of Apache Spark. It provides several image pre-processing features for improving text recognition results such as adaptive thresholding and denoising, skew detection & correction, adaptive scaling, layout analysis and region detection, image cropping, removing background objects.

Annotation Lab

The Annotation Lab is a robust data labeling and AI/ML solution for the cloud. It enables customers to annotate their data and generate their models in a simple, fast and efficient project based workflow. The Annotation Lab is a robust data labeling and AI/ML solution for the cloud.

Visual Co-Plot

A statistical analysis tool, tailored for datasets with few observations and many variables which may be intercorrelated. Co-Plot enables visually analysing observations, variables and the correlations between them together.

Groups

John Snow Labs

John Snow Labs, an AI and NLP for healthcare company, provides state-of-the-art software, models, and data to help healthcare and life science organizations build, deploy, and operate AI projects.

Pacific AI

Pacific AI provides consulting CTO services for high-growth software companies, specializing in applying AI, big data and data science. I conduct technology due diligence, board reviews and AI strategy & architecture workshops. An elite team of data scientists, data engineers and data researchers delivers complete projects.

Forbes Technology Council

I'm a member of the Forbes Technology Council, an invitation-only community for CIOs, CTOs and technology executives. It's a curated network of successful peers, provides access to a variety of benefits and resources, and includes the opportunity to submit thought leadership articles and short tips on industry-related topics for publishing on Forbes.com.

Data Science in Healthcare

I started and run the Linkedin group for data science in healthcare. Doing my best to keep an open, evidence based and vendor neutral discussion, focused on recent research results and field case studies.