I'm based in Seattle and am a Consulting CTO, helping build products and teams that apply AI, big data and data science - mostly in healthcare. My interests include large scale distributed systems, machine learning, natural language processing and agile methods.

Talks

Natural Language Understanding

Data Science in Production

Data Driven Healthcare

AI Platform Architecture

  • How to build an open source data science platform.
    Half-day tutorial at TDWI Anaheim, California, August 2018.
  • Building a new predictive model & API in 30 minutes.
    With Claudiu Barbura, at PAPIs.io — The Predictive APIs and Apps Conference, in Barcelona, Spain, November 2014.
  • Building an intelligent big data app in 30 minutes.
    With Claudiu Barbura, at the Strata Barcelona Conference, in Barcelona, Spain, November 2014.
  • Lessons Learned from Embedding Cassandra in an enterprise-grade big data platform.
    With Claudiu Barbura, at Cassandra Day Seattle, in Bellevue, WA, July 2014.
  • Leveraging a big data infrastructure to accelerate the data science workflow.
    At the 5th Timisoara Big Data Meetup, in Timisoara, Romania, June 2014.

Machine Learning for Fraud Detection

  • Hunting Criminals with Hybrid Analytics.
    At Data by the Bay, San Francisco, May 2017; Global Data Science Conference 2016, in Santa Clara, CA, March 2016; and IBM Datapalooza, in Seattle, WA, February 2016.
  • Architecting a predictive, petabyte-scale, self-learning fraud detection system.
    Global Predictive Analytics Conference, Santa Clara, CA, March 2017.
  • Online Predictive Modeling of Fraud Schemes from Multiple Live Streams.
    With Claudiu Branzan, at Spark Summit East, in New York, NY, February 2016.
  • Online fraud detection: A reference architecture for adversarial learning.
    At MLConf Atlanta, in Atlanta, GA, September 2015.
  • Hunting Criminals with Hybrid Analytics, Semi-supervised Learning & Agent Feedback.
    With Claudiu Branzan, At the Smart Data Conference, in San Jose, CA, August 2015 & at at Strata + Hadoop World, in London, UK, May 2015.
  • Active learning from streams of graph, language & time series signals.
    With Claudiu Branzan, At the Data Science Summit & Dato Conference, in San Francisco, CA, July 2015.

Research

Parallel Computer Scheduling & Workload Modeling

Agile Software Development

Model Driven Software Engineering

Patents

Healthcare

Data Science

Software Engineering

Software

Natural Language Processing

An Apache-licensed natural language processing library built on top of Apache Spark and its Spark ML library. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment.

Visual Co-Plot

A statistical analysis tool, tailored for datasets with few observations and many variables which may be intercorrelated. Co-Plot enables visually analysing observations, variables and the correlations between them together.

Parallel Workload Analyser

A tool for analysing parallel computer workloads in standard workload format. Computes self-similarity, auto-correlation, distributions, time series, per-month and summary statistics.

Groups

Pacific AI

Pacific AI provides consulting CTO services for high-growth software companies, specializing in applying AI, big data and data science. I conduct technology due diligence, board reviews and AI strategy & architecture workshops. An elite team of data scientists, data engineers and data researchers delivers complete projects.

Forbes Technology Council

I'm a member of the Forbes Technology Council, an invitation-only community for CIOs, CTOs and technology executives. It's a curated network of successful peers, provides access to a variety of benefits and resources, and includes the opportunity to submit thought leadership articles and short tips on industry-related topics for publishing on Forbes.com.

Data Science in Healthcare

I started and run the Linkedin group for data science in healthcare. Doing my best to keep an open, evidence based and vendor neutral discussion, focused on recent research results and field case studies.