My soothsayer friend BG told me last year that “deep learning is the next big thing”. I didn’t know what that meant. A few days ago, I attended the AIFrontiers conference in Santa Clara, California. Now I have a glimpse of what he meant :-)

What is Intelligence?

In this context, by “intelligence”, I interpret it as “smart”. Yes, we have smart phones, smart TVs, and smart speakers. But imagine way more smarter software and devices… like self-driving cars!

Note that artificial Intelligence is about understanding intelligence. Machine Learning is a “brute force” data-driven approach to simulating intelligence., they are related but not the same thing. There are many areas that will lead to Artificial General Intelligence (AGI) which means “a software that can do any task”, as opposed to Machine Learning which creates software that can do specific tasks. This conference was about Machine Learning, and specifically Deep Learning.

To summarize the scope of the areas, Artificial Intelligence > Machine Learning > Deep Learning.

From Analog to Digital to Intelligence

The mantra at this conference was that we will move from a software stack to an intelligence stack to solve future engineering challenges.

This was best explained by the legendary Jeff Dean in his keynote speech, talking about how many products at Google use deep learning:

Deep Learning at Google

What is Machine Learning?

Machine learning is one technique to achieve intelligence.

What is machine learning? My understanding is: it is about making computer programs whose behavior is learned from data instead of solely based on lines of code written by humans. Think spam filters – whenever we click on “Spam” or “Not Spam” buttons, the spam filtering system learns from this and the behavior changes over time to reflect that, without somebody explicitly writing code for every single email. On top of this idea, design the system to learn by itself, and it can learn and improve orders of magnitude faster.

What makes Machine Learning special? Because the system is now learning behaviors that is more accurate for the task and can handle more situations than the algorithms we humans could have imagined! Think converting sentences from one human language to another, self-driving cars, etc. Think of all the situations that such systems need to handle. We could have not written code to handle every situation.

Why now? Because machine learning requires:

  1. Lots of data – which we have now thanks to (a) so many people buying mobile phones, (b) mobile phones sensors and apps generating so much data.
  2. Lots of computers – which we have now thanks to cloud computing.
  3. Lots of parallel processing power (think matrix multiplications) – which we have now thanks to Graphics Processing Units (GPUs).

What is Deep Learning?

What is deep learning? It is a machine learning technique that is based on “layers of neurons”, i.e. think of millions of neurons in your human brain that work together to understand, perceive, store knowledge… deep learning tries to simulate your brain. At least, that’s the way I understood it.

Jeff Dean explains deep learning

What do you want in a Machine Learning System?

Jeff Dean talked about their first internal machine learning system, the problems they faced, and what they ideally wanted:

What do you want in a Machine Learning System?
Computation Time and Research Productivity

And eventually they designed TensorFlow to achieve those desirable features.

He went on to mention the algorithms they use for different products, which I found interesting, not because I understood what they meant, but because they are pointers in case you want to learn more. After all, the whole point of attending conferences and meetups is to know what is happening out there.

Speech Recognition
Google Photos Search
Google Search
Language Translation

Some of these models can be found at https://github.com/tensorflow/models.

Jeff Dean also mentioned the kind of impact they have had on products, esp. converting April Fool’s Day jokes into reality:

Google Inbox Smart Reply
Algorithms behind Google Inbox Smart Reply

Jeff Dean expects more reuse of machine learning-developed models across different tasks, described as zero-shot learning:

Zero-shot learning

And more compute-based model generation:

More compute

Jeff Dean also gave a glimpse of what kind of queries they hope to achieve in the future:

Google Search queries of the future

Autonomous Driving

There was a lot of info throughout the day, so I’ll only post what I found were interesting topics / slides in the discussions:

Speakers were from Waymo (Google), Tesla Motors (not in official capacity), Baidu Autonomous Driving Unit.
Google / Waymo designing a car specifically for autonomous driving

Baidu also played videos of their self-driving cars in China, so this is not just a USA-only phenomenon. China, indeed, may have an edge in AI.

Big Data and Machine Learning in the car

This is a reason why I feel C++, the beast, is making a comeback – because performance and efficient hardware usage is important again, because we now have to run a lot of processing on the Internet of Things, especially self-driving cars. And because it’s C++, correctness becomes a new risk. This might give a clue as to why Tesla Motors attracted Chris Lattner, the creator of the LLVM compiler, speculation is that Tesla Motors wants to build an integrated autopilot system from chip to compiler.

Computer Chips specifically for autonomous driving

With Google creating custom chips called Tensor Processing Units (“TPU”) for machine learning model generation in the cloud to NVidia making chips for self-driving cars to Intel releasing it’s Go platform containing 5G modems and chips for self-driving cars, efficient and performant chips for machine learning has become important. This explains why NVidia’s shares have gone up 225% in 2016.

The car is one node of the Internet of Things. It will connect and interact with the cloud.

This is very familiar to me because that is what we do at Automatic.

Speech-Enabled Assistants

Speakers were from Microsoft, Baidu, Amazon Alexa.

Microsoft:

Speech is not the same as text processing, there are more nuances.
Types of chatbots

Baidu:

Why deep learning
Handle issues such as background noise and multiple people speaking
Handle issues such as person speaking from other end of room
They converted existing voice recordings to far-field and used that to train models
How much compute power, you ask?
GPUs to the rescue
Deep Speech works for Mandarin
Deep Speech works for multiple languages
Why focus on speech? More inclusive and faster than typing.
Speech recognition can be more accurate than typing for non-technical people
Try the TalkType app for Android
Baidu’s Goal is AI for 100 million people

Amazon Alexa:

Speech recognition process
‘LSTM’ technique

See Wikipedia entry on Long short-term memory.

More techniques

Natural Language Processing

Speaker was from Google Brain

He talked about how deep learning has dramatically changed the field of NLP. Focused on “end-to-end” deep learning methods.

Computer Vision (Perception)

Speakers were from OpenCV, Bosch and Google

An example of using computer vision is from Jeff Dean’s keynote speech – https://www.google.com/get/sunroof – enter your address, it will tell you how much roof area you have and how much money you can save by switching to solar energy!

OpenCV is a popular open source computer vision library:

OpenCV 3
Deep Learning comes to OpenCV

Google:

Street View to Vision processing to Local Business discovery, cars, cameras, vision, and maps – all in one sentence
New machine learning techniques, better data and compute, you get the idea.
Future of Perception

Impact of AI on jobs

Speaker was from McKinsey
McKinsey study focus
Based on current AI/ML capabilities: Few jobs will be fully automatable. Most jobs will only be partially automatable. That’s a relief!

Internet of Things

Speakers were from Bosch, Nervana (Intel) and Vion

Vion Vision was the most interesting. They are deploying machine learning models to devices like cameras. They demonstrated their bus-counting cameras that helps bus operators to get real-time traffic so that they can deploy more buses in high-traffic routes, etc. They even had a demo of public-area cameras that auto-detect a crowd beating up a person and sending an alert to the local police station.

Vion Vision cameras
Camera counting
Custom chip for deep learning

Deep Learning Frameworks

Speakers were from Google, Facebook and Amazon

This was an amazing session where creators or prominent members of each Deep Learning Framework came up and talked about their thoughts on the framework status and future.

Rethinking slow float-based computation
Math Challenges
Unframework?
MAPS
  • Scalability – How do I train on multiple GPUs and CPUs? OpenMPI, NCCL, ZeroMQ, etc.
  • Portability – Cloud, Mobile, IoT, cars, drones, coffee makers. Constraints – limited computation, battery life, models maybe luxurious, ecosystem less developed
  • Augmented Computation Patterns – more than float dense math – quantized computation, sparse math libs, model compression, rethinking existing ops (ResNEXT)
  • Augmented Math Challenges
  • Modularity – reusability
No silver bullet

Amazon mxnet:

Why another framework?
Core philosophy of mxnet
Current state of industry
Future direction
Torch next generation
Another vote for sharing components

Thank You AIFrontiers Organizers

It was an excellent conference, with well-chosen topics and the best speakers imaginable – the platform creators themselves. People who were expecting deep-dives or technical details were disappointed, but it was a great “state of the industry” conference for people like me who know nothing about the topic.

Thank you to the conference organizers, the Silicon Valley AI and Big Data Association and all the sponsors.

Ending Note

Geoffrey Moore (author of “Crossing The Chasm”) says:

In the coming decade all global enterprises, both private and public, will target the trapped value in their ineffective and inefficient outward-facing relationships with their targeted constituencies, be they consumers, clients, customers, patients, students, or citizens. Authentic sustainable engagement will become the new scarce ingredient. The as-a-service model will expand from commodity transactions to incorporate more significant life interests as well—education, health, personal development, family relationships, wealth management, safety and security, and the like. Machine learning and artificial intelligence will be the new keys to the kingdom, enabling institutions to operate at global scale with unprecedented speed, relevance, and accuracy. Operating models will prioritize customer relationship effectiveness over the supply chain efficiency, causing CRM to displace ERP as the most prominent information system, and the hot expertise will lie in user experience design, data analytics, machine learning, and artificial intelligence.

These are my quick jottings during the talks at PGConf SV today:

Citus DB (distributed postgresql) will be open sourced

citusdb is going open source as a PostgreSQL extension #pgconfsv – Josh Berkus

First applause of day as @umurc announces CitusDB is going open source. #PGConfSV – merv

Everybody loves Kafka

Lots of Kafka love here at #pgconfsv Seems like Postgres + Kafka is a love match right now … – Josh Berkus

Hasura says JSON > SQL

Intriguing consulting company from India, although I didn’t get a chance to talk to them, the gist is that they provide a MongoDB-like JSON querying interface on top of RDBMS databases.

Update: There’s also PostgREST which is an open source project in Haskell that is similar (via the awesome-postgres list).

TripAdvisor runs on Postgresql

Matthew Kelly of TripAdvisor.

4 datacenters. 100 dedicated Postgres servers. 768 GB RAM. Multi-terabyte databases. 315 million unique visitors per month.

Switching from DRBD to streaming replication.

Switching Collation: utf.en-us -> C because glibc keeps changing character sorting and affects indexes

Switching Hardware: RAM -> SSD

Cross datacenter replication is done by custom trigger-based replication.

Hopes to see BDR in core.

Active/Passive model of sites – two fully functional sites, keep flipping active role. Secondary site used for disaster recovery, load testing, hardware upgrades, etc.

Development environments – weekly dump restores of all schema and all non-PII (?) data into 3 mini sites – dev, prerelease and test lab. 36+ hour process that completes every weekend.

System Tuning:

  • Always separate your WAL, data and temp partitions onto different disks, even on SSDs.
  • Make sure your kernel thinks your SSD array isn’t a spinning disk array.

Cache Statements:

  • 60% CPU savings by properly caching prepared statements.

Cascading Failures:

  • Statement timeout is a must
  • Separating read and write threadpools

Standard Hardware:

  • From 256-768 GB RAM & 15K spinning drives to 256GB RAM & enterprise-grade SSDs
  • Next bottleneck
    • Kernel version – requires Puppet upgrade + moving to systemd
    • 1 Gbps networking isn’t enough

Prestogres – connecting presto query engine via postgresql protocol to visualization tools

Sadayuki Furuhashi of Treasure Data. Also created MessagePack and Fluentd.

Before: HDFC -> Hive daily/hourly batch -> Postgresql -> Dashboard / Interactive query Now: HDFC -> Presto -> Dashboard

Presto distributed query engine from Facebook. Connects to Cassandra, Hive, JDBC, Postgres, Kafka, etc.

Why Presto? Because elastic. Adding a server improves performance instantly. Scale performance when we need. Separate computation engine from storage engine.

Why Presto over MapReduce? Because:

  • memory-to-memory data transfer
    • no disk IO
    • data chunk must fit in memory
  • all stages are pipelined
    • no wait time
    • no fault tolerance

Writing connectors for data visualization & business intelligence tools to talk to Presto would be a lot of work, so why not create a Postgresql protocol adapter for Presto.

Other possible designs were:

  • MySQL protocol + libdrizzle : But Presto has syntax differences with MySQL
  • Postgresql + Foreign Data Wrapper : JOIN and aggregation pushdown is not available yet

Difficulties to implement Postgres protocol:

  • Emulating system catalogs : pg_class, pg_namespace, pg_proc, etc.
  • Rewriting transactions (BEGIN, COMMIT) since Presto doesn’t support transactions

Prestogres design: pgpool-II + postgresql + PL/Python. Basic idea is rewrite queries at pgpool-II and run presto queries using PL/Python.

Uses a patched pgpool-II which creates & runs functions in the postgresql instance that will create system tables & records, and queries will be translated via PL/Python into Presto queries.

Heap Analytics uses Citus DB

Dan Robinson, Heap Inc.

Store every event, analyze retroactively. Challenges:

  • 95% of data is never used.
  • Funnels, retention, behavioral cohorts, grouping, filtering, etc. can’t pre-aggregate.
  • As real-time as possible, within minutes.

5000 customers. 60 TB on disk. 80 billion events. 2 billion users. 2.4 billion events last week. Can’t scale vertically. So Citus DB.

Schema:

users – customer id bigint, user id bigint, data jsonb. events – customer id foreign key, user id foreign key, event jsonb.

Basic Query:

select count(*) from users where customer_id = 123 group by properties ->> 'ab_test_grp' 

Complex queries with joins, group by, etc. done real-time via Citus DB. Citus DB parallelizes the queries among the individual postgres (shard) instances and aggregates them on the master node.

Making use of postgresql partial indexes (indexes on WHERE queries) when customer creates the query, for performance. This works well because data is sparse.

Make use of user-defined functions (UDFs), e.g. to analyze whether a user matches a funnel.

Where does data live before it gets into the Citus DB cluster? -> Use Kafka as a short-term commit log.

Kafka consumers make use of Postgres UDFs to make writes commutative and idempotent. Makes use of user exists checks, upserts, updates, etc.

Sharding by user, not time range. All shards written to all the time. How do we move shards, split shards, rehydrate new replicas, etc.? Use Kafka commit number to replicate the data and replay data after that commit number.

Future Work:

  • Majority of queries touch only last 2 weeks of data – can we split out recent data onto nicer hardware?
  • Numerical analysis beyonds counts – min, max, averages, histograms
  • Richer analysis, more behavioral cohorting, data pivoting, etc.
  • Live updates

How real-time is it? Events are ingested within minutes.

MixRank on Terabyte Postgresql

Scott Milliken, founder of MixRank.

Low maintenance thanks to Postgresql, compared to war stories with newer big data solutions.

Vacuum can change query plans and cause regressions in production.

In low digit percentages of queries, cannot predict query planner, so try them all. Use CTEs (Common Table Expressions) to force different plans, race them, kill the losers. Ugly but surprisingly effective. Implemented generically using our own higher-level query planner. Why CTEs? Because they are an optimization boundary.

Use SQLAlchemy. We don’t use the ORM parts, we use it as a DSL on top of SQL. So dynamically introspect the queries and do permutations to generate the different plans. Don’t try to generate different query plans by hand, that will be hard to maintain. One way to do this is to query the pg_class table to figure out which indexes are present, and generate permutations to use different indexes.

Comment from audience: You can write your own C module and override postgresql to use your own query planner.

Batch update, insert, delete queries are a great substitute for Hadoop (for us). But correct results can lag and performance can suffer.

Schedule pg_repack to run periodically, not vacuum full.

You can scale a single postgres pretty far, more than you think. We have 1 (good dedicated hardware) box with 3.7 GB/s. Performance on a good dedicated hardware over others is 10-100 times, i.e. 1-2 orders of magnitude.

Using lz4 encoding for ZFS compression results in 43% lesser data size.

Amazon RDS for PostgreSQL : Lessons learned and deep dive on new features

Grant McAlister, Senior Principal Engineer, AWS RDS.

What’s new in storage:

  • From 3TB limit to 6TB
    • PIOPS limit is still 30K
  • Encryption at rest
    • Uses AWS Key Management Service (KMS), part of AWS IAM
    • Includes all data files, log files, log backups, and snapshots
    • Low performance overhead, 5-10% overhead on heavy writes
      • Will reduce over time because Intel CPUs are getting better on offloading encryption
      • Unencrypted snapshot sharing, even share to public

Major version upgrade to Postgresql 9.4, uses pg_upgrade. Recommendation: Test first with a copy instance. Will also help you figure out how much downtime to expect.

Use rds_superuser_reserved_connections to reserve connections for admin purposes.

Use pg_buffercache to determine working set memory size.

Use AWS Database Migration Service (DMS) to move data to same or different database engine. From customer premises to AWS RDS. 11 hours for a terabyte, depending on your network speed. At least version 9.4 for Postgresql because using logical decoding feature. In Preview release now.

Use AWS Schema Conversion Tool (SCT) to migrate stored procedures, etc.

Scale and Availability:

  • select sql query will check for buffer in shared_buffers, if not load from pagecache/disk, if not load, load from EBS.
    • shared buffers = working set size
  • Have replicas in different availability zones, i.e. multi-AZ
  • Use DNS CNAMEs for failover, takes 65 seconds
  • Read replicas = availability

Burst mode: GP2 & T2

  • Earn credits when performance below base
  • If < 10,000 transactions per second, using burst mode will cost much lesser than PIOPS

Cross-region replication is being planned. Currently, you can copy snapshots across regions.