PodSync.net converts a YouTube channel into a podcast.
This is a beautiful bridge that lets me listen to DJ mixes by “Confused bi-Product of a Misinformed Culture” without having to use YouTube.
PodSync.net converts a YouTube channel into a podcast.
This is a beautiful bridge that lets me listen to DJ mixes by “Confused bi-Product of a Misinformed Culture” without having to use YouTube.
Four years ago, I migrated this blog from WordPress to Jekyll, with the intention of using whatever format I want to use inside Emacs… Subsequently, my posting rate dropped drastically to just 13 posts in 4 years!
I don’t think that was a coincidence. Tools matter.
I believe the speed and ease of writing dropped drastically. Even simple steps like using photos in a post meant using a separate tool such as Finder.app (on macOS) or command-line to move it to the right directory and then linking to it from the main post. In WordPress, that’s one drag-and-drop and done.
Similarly, no comments was demotivating as well. While there tends to be more nitpicking these days, I would still like to benefit from the wisdom of the crowds.
So now I have migrated back to WordPress. Let’s see how this goes.
A few months ago, Mayank convinced me to get some Ether (Ethereum cryptocurrency) because it was going to go on a bull run, thanks to high-profile companies backing Ethereum by joining the Ethereum Enterprise Alliance (EEA). So I did. And that event did happen – including Microsoft, Intel, MasterCard, Cisco, JP Morgan and the State of Andhra Pradesh, and yes, Ethereum went through a bull run (to $336 per ETH, as of this writing).
That’s when I started going down the rabbit hole of the cryptocurrency space 😬
The way I understand it is that cryptocurrency is digital money. So why is it different from PayPal or Paytm? Because this is not a national currency like rupees or dollars, this is a currency “for the people, by the people, of the people”. No government has sanctioned it or vouches for it. Sounds nuts, right?
But that’s what so exciting. Think of how people tinkering with technology can start a transformation like Steve Wozniak designing the Mac or Tim Berners-Lee creating the world wide web. People are now tinkering with creating a virtual currency that nobody can control, except by the participants agreeing to make changes, which makes it democratic and hence chaotic at the same time.
To know what is Ethereum, see WTF is Ethereum?
So all this got me curious about things at an implementation level (yep, it’s an ongoing theme with me). So, again, via Twitter, when I read that Blockchain @ Berkeley was hosting an Ethereum dev bootcamp, I signed up!
Note that I could have probably learned the same stuff online such as going through the Blockchain @ Berkeley’s Decal videos, etc. I just preferred a 2-day immersion, so I went to the in-person course.
The first day was an introduction and tons of questions by the audience. Everything from architecture to economics and incentives to security. Then we got an introduction to Solidity language and used the Truffle framework to practice writing a simplistic ecommerce shop smart contract.
The second day was an overview of oracles, web3.js, metamask, security (how not to ICO), authentication. There was so much to absorb here.
Special thanks to the instructors Ali Mousa and Collin Chin for a useful course. In fact, they had just finished a smart contract project on an internal supply-chain system for Airbus, and had plenty of practical advice to offer.
There are many dangers lurking such as cryptos being disinflationary, so be careful with investing in ICOs.
The idealist in me really wonders if all of this is really happening. People are actually working to decentralize the web and on top of that, raising more money democratically than traditional venture capital via Initial Coin Offerings (ICOs). Even creating new kinds of venture enablers. But I do wonder about actual user adoption though. I guess this is a “build it and they will come” excitement.
There’s still a long way to go to make the development tools and the ecosystem better and safer though. Every podcast I’ve heard describes the current state as the “dialup days of decentralized web (web3)”.
Even then, all the nerds are excited. Why? Because we are so used to accessing databases like Facebook or Google via the Internet, this is the first time that we have a database built as a protocol on top of the Internet, and hence it is decentralized. And this database can act as money and a financial system, which means money can be democratized which has never happened before. There’s a reason why kings and governments are the only ones who can print money – because it means power.
Now take decentralized database and decentralized money and put decentralized smart contracts on top of it (via Ethereum) and you can get two parties to do business with each other without the need for trusted third-parties, like banks! Smart contracts will destroy the current idea of a legal system, the current idea of a law firm and of a lawyer. Take it one step further and you can run entire companies on Ethereum – everything from cap table, governance, fundraising, payroll, accounting to bylaws and running entire communities. Maybe someday we can replace “don’t be evil” with “can’t be evil”. Consider me mind-blown. The proof in the pudding is that right now you can work with a freelancer via an Ethereum-based platform.
If you don’t know what is machine learning, just know this from Francois Chollet (creator of Keras)’s “Deep Learning with Python” book:
After attending the AI Frontiers conference at the beginning of this year, I was amazed, fascinated and befuddled at what actually is machine learning and deep learning and all of the associated buzzwords at an implementation level. I wanted to learn more about this. So, on a whim, I downloaded the TWiML podcast to listen during my commute and happened to be listening to an interview with Siraj Raval. Next thing you know, I checked out Siraj’s YouTube channel and followed him on Twitter. On Twitter, he kept talking about big news coming up in a few days, and turns out that he was co-creating the Udacity’s Deep Learning Foundation course (a MOOC). I was excited by Siraj’s and Mat’s intro video, and I immediately signed up and waited in trepidation.
The good part about the course was that there is a weekly schedule of lessons and projects. As I keep saying to friends and colleagues, nothing in the modern world ever gets done without a deadline (don’t tell your boss that).
The bad part was that the course was literally being built while we were enrolled, so we would see a mad rush by the instructors to write and create the content every week for the upcoming week, which was okay by me, because getting introduced to a topic that has only become feasible in the recent years and making it accessible in a way for people who don’t have Ph.D in machine learning, was exciting and I was grateful.
In the first few weeks, we dove into Anaconda (I’ve been doing Python for 10 years and had never heard of it), Jupyter notebooks (again, had never paid attention to or used it before), and started learning about perceptrons and neural networks. I was lost in the first few weeks. The course was advertised as 3 hours / week which was clearly insufficient, I had to spend like ~15 hours/week to catch up on the course and make sense of it all.
Like a tortoise, slowly I caught up, and reading Andrew Trask’s brilliant introductory book which was the course’s prescribed text book, I started understanding a little. We started off mostly with supervised learning, where we provide the training data set and the expected output. The lessons got into higher gear with learning convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The way I understood is that CNNs are useful for working on the full input such as individual images, because you’re extracting and condensing patterns with several layers and getting a condensed representation of the full input. RNNs are useful for sequences where there is a dependency such as text, where a sentence can depend on a previous sentence.
Whenever motivation was low, Siraj’s videos kept the enthusiasm and fascination flowing!
The projects also kept me going throughout the course because that’s where the understanding is really put to the test. Since I was taking copious notes during the course, I was forced to pay attention to the details, and that helped a lot during the project.
The lessons combined with the great idea of using a forum and dedicated forum mentors who guide you on questions that you have, about both lessons and projects, was just a perfect learning environment. I can’t thank the forum mentors enough.
The last topic of the course was generative adversarial networks (GANs), a type of unsupervised learning, which is actually a relatively recent concept, the paper came out in 2014! It applies game theory to neural networks to make two neural networks to compete with each other, the generator creates new patterns and the discriminator (trained on real data) decides whether it is realistic enough or not, forcing the generator to create realistic data after sufficient training.
Unfortunately, life happened, and I was delayed by a month to work on the last project. So it took immense effort to get back into the groove. The project was to generate faces! Imagine that! That invigorated me and was so glad to finally see this screen:
There was plenty of other concepts we learned along the way such as autoencoders and reinforcement learning, it would take an entire article to list all the concepts we encountered.
I’m thankful to Udacity for this course, I could see that not all students were satisfied with this course, but this course was oh so worth it for me. Getting introduced to data science, machine learning and deep learning in a few months has been a gruelling and happy experience.
I signed off in the course slack community with this:
NOTE: This story here is my personal perspective, it does not represent the views of my employer.
5 years ago, I joined Thejo on his (then) next adventure, Automatic which launched 4 years ago, that story is here. The premise sounded interesting – what can you do when you tap into the data generated from your car. The vision was “owning a car can be safer, cheaper, and smarter”.
Two years ago, we had a real API and events platform and mobile apps that our customers are happy with. Customers especially use it with IFTTT integration and do things like log their business trips to a spreadsheet for expense reporting, for generating SMS messages to friends or family, to switch on/off their thermostat at home, and so on.
Last year, we launched our 3G version of the device. I personally built our core ingest servers that takes in all the real-time data being uploaded from our connected devices plugged into cars, massages that data and sends it down to all the internal microservices, and we’re talking lots of different types of data and interaction models. That core ingest server is now the foundation of all our products. It was a fun and challenging project.
Automatic then bounced back with partnerships such as with American Family insurance to take usage-based insurance forward.
Today, the exciting news is that Sirius XM has acquired Automatic for over $100M to take the product forward in a far bigger way than was possible for a startup! And already our customers love it.
What makes Sirius XM interesting?
It has been a privilege to work in the trenches with Thejo (the visionary, the deal maker), Dr. Jerry (putting the science in data), Ljuba (how to do UX right), Ram J (the original 10x engineer), and several other brilliant folks.
I’m glad the Automatic story continues and strongly. To the future, the connected car!
An overview of these companies:
 Alfred now has a Mega Supporter License with lifetime free upgrades.
My soothsayer friend BG told me last year that “deep learning is the next big thing”. I didn’t know what that meant. A few days ago, I attended the AIFrontiers conference in Santa Clara, California. Now I have a glimpse of what he meant :-)
In this context, by “intelligence”, I interpret it as “smart”. Yes, we have smart phones, smart TVs, and smart speakers. But imagine way more smarter software and devices… like self-driving cars!
Note that artificial Intelligence is about understanding intelligence. Machine Learning is a “brute force” data-driven approach to simulating intelligence., they are related but not the same thing. There are many areas that will lead to Artificial General Intelligence (AGI) which means “a software that can do any task”, as opposed to Machine Learning which creates software that can do specific tasks. This conference was about Machine Learning, and specifically Deep Learning.
To summarize the scope of the areas, Artificial Intelligence > Machine Learning > Deep Learning.
The mantra at this conference was that we will move from a software stack to an intelligence stack to solve future engineering challenges.
This was best explained by the legendary Jeff Dean in his keynote speech, talking about how many products at Google use deep learning:
Machine learning is one technique to achieve intelligence.
What is machine learning? My understanding is: it is about making computer programs whose behavior is learned from data instead of solely based on lines of code written by humans. Think spam filters – whenever we click on “Spam” or “Not Spam” buttons, the spam filtering system learns from this and the behavior changes over time to reflect that, without somebody explicitly writing code for every single email. On top of this idea, design the system to learn by itself, and it can learn and improve orders of magnitude faster.
What makes Machine Learning special? Because the system is now learning behaviors that is more accurate for the task and can handle more situations than the algorithms we humans could have imagined! Think converting sentences from one human language to another, self-driving cars, etc. Think of all the situations that such systems need to handle. We could have not written code to handle every situation.
Why now? Because machine learning requires:
What is deep learning? It is a machine learning technique that is based on “layers of neurons”, i.e. think of millions of neurons in your human brain that work together to understand, perceive, store knowledge… deep learning tries to simulate your brain. At least, that’s the way I understood it.
Jeff Dean talked about their first internal machine learning system, the problems they faced, and what they ideally wanted:
And eventually they designed TensorFlow to achieve those desirable features.
He went on to mention the algorithms they use for different products, which I found interesting, not because I understood what they meant, but because they are pointers in case you want to learn more. After all, the whole point of attending conferences and meetups is to know what is happening out there.
Some of these models can be found at https://github.com/tensorflow/models.
Jeff Dean also mentioned the kind of impact they have had on products, esp. converting April Fool’s Day jokes into reality:
Jeff Dean expects more reuse of machine learning-developed models across different tasks, described as zero-shot learning:
And more compute-based model generation:
Jeff Dean also gave a glimpse of what kind of queries they hope to achieve in the future:
There was a lot of info throughout the day, so I’ll only post what I found were interesting topics / slides in the discussions:
Baidu also played videos of their self-driving cars in China, so this is not just a USA-only phenomenon. China, indeed, may have an edge in AI.
This is a reason why I feel C++, the beast, is making a comeback – because performance and efficient hardware usage is important again, because we now have to run a lot of processing on the Internet of Things, especially self-driving cars. And because it’s C++, correctness becomes a new risk. This might give a clue as to why Tesla Motors attracted Chris Lattner, the creator of the LLVM compiler, speculation is that Tesla Motors wants to build an integrated autopilot system from chip to compiler.
With Google creating custom chips called Tensor Processing Units (“TPU”) for machine learning model generation in the cloud to NVidia making chips for self-driving cars to Intel releasing it’s Go platform containing 5G modems and chips for self-driving cars, efficient and performant chips for machine learning has become important. This explains why NVidia’s shares have gone up 225% in 2016.
This is very familiar to me because that is what we do at Automatic.
He talked about how deep learning has dramatically changed the field of NLP. Focused on “end-to-end” deep learning methods.
An example of using computer vision is from Jeff Dean’s keynote speech – https://www.google.com/get/sunroof – enter your address, it will tell you how much roof area you have and how much money you can save by switching to solar energy!
OpenCV is a popular open source computer vision library:
Vion Vision was the most interesting. They are deploying machine learning models to devices like cameras. They demonstrated their bus-counting cameras that helps bus operators to get real-time traffic so that they can deploy more buses in high-traffic routes, etc. They even had a demo of public-area cameras that auto-detect a crowd beating up a person and sending an alert to the local police station.
This was an amazing session where creators or prominent members of each Deep Learning Framework came up and talked about their thoughts on the framework status and future.
It was an excellent conference, with well-chosen topics and the best speakers imaginable – the platform creators themselves. People who were expecting deep-dives or technical details were disappointed, but it was a great “state of the industry” conference for people like me who know nothing about the topic.
In the coming decade all global enterprises, both private and public, will target the trapped value in their ineffective and inefficient outward-facing relationships with their targeted constituencies, be they consumers, clients, customers, patients, students, or citizens. Authentic sustainable engagement will become the new scarce ingredient. The as-a-service model will expand from commodity transactions to incorporate more significant life interests as well—education, health, personal development, family relationships, wealth management, safety and security, and the like. Machine learning and artificial intelligence will be the new keys to the kingdom, enabling institutions to operate at global scale with unprecedented speed, relevance, and accuracy. Operating models will prioritize customer relationship effectiveness over the supply chain efficiency, causing CRM to displace ERP as the most prominent information system, and the hot expertise will lie in user experience design, data analytics, machine learning, and artificial intelligence.
Thank you Mo Lun for creating a brand new Chinese translation of the latest version of A Byte of Python book!
In Mo Lun’s words:
I am a common journalism student from CYU, Beijing. And actually, I am an absolute newbie in Python programming when I start to translate this book. Initially, it was just a whim, but when I done this work, I realized that a decision triggered by interest had prompted me to go so far. With the help of my predecessors’ translations and the vast amount of information provided by the developed Internet, and with the help of my friends, I prudently presented this translation edition. I just hope my translation work will help other newcomers in learning Python. At the same time, I am always waiting for my translation of the comments and suggestions, and ready to change or improve this superficial work.
Note that the full translations list is at https://python.swaroopch.com/translations.html and you can read how to create a new translation at https://python.swaroopch.com/translation_howto.html.
These are my quick jottings during the talks at PGConf SV today:
citusdb is going open source as a PostgreSQL extension #pgconfsv – Josh Berkus
First applause of day as @umurc announces CitusDB is going open source. #PGConfSV – merv
Lots of Kafka love here at #pgconfsv Seems like Postgres + Kafka is a love match right now … – Josh Berkus
Intriguing consulting company from India, although I didn’t get a chance to talk to them, the gist is that they provide a MongoDB-like JSON querying interface on top of RDBMS databases.
Matthew Kelly of TripAdvisor.
4 datacenters. 100 dedicated Postgres servers. 768 GB RAM. Multi-terabyte databases. 315 million unique visitors per month.
Switching from DRBD to streaming replication.
Switching Collation: utf.en-us -> C because glibc keeps changing character sorting and affects indexes
Switching Hardware: RAM -> SSD
Cross datacenter replication is done by custom trigger-based replication.
Hopes to see BDR in core.
Active/Passive model of sites – two fully functional sites, keep flipping active role. Secondary site used for disaster recovery, load testing, hardware upgrades, etc.
Development environments – weekly dump restores of all schema and all non-PII (?) data into 3 mini sites – dev, prerelease and test lab. 36+ hour process that completes every weekend.
Sadayuki Furuhashi of Treasure Data. Also created MessagePack and Fluentd.
Before: HDFC -> Hive daily/hourly batch -> Postgresql -> Dashboard / Interactive query Now: HDFC -> Presto -> Dashboard
Presto distributed query engine from Facebook. Connects to Cassandra, Hive, JDBC, Postgres, Kafka, etc.
Why Presto? Because elastic. Adding a server improves performance instantly. Scale performance when we need. Separate computation engine from storage engine.
Why Presto over MapReduce? Because:
Writing connectors for data visualization & business intelligence tools to talk to Presto would be a lot of work, so why not create a Postgresql protocol adapter for Presto.
Other possible designs were:
Difficulties to implement Postgres protocol:
Prestogres design: pgpool-II + postgresql + PL/Python. Basic idea is rewrite queries at pgpool-II and run presto queries using PL/Python.
Uses a patched pgpool-II which creates & runs functions in the postgresql instance that will create system tables & records, and queries will be translated via PL/Python into Presto queries.
Dan Robinson, Heap Inc.
Store every event, analyze retroactively. Challenges:
5000 customers. 60 TB on disk. 80 billion events. 2 billion users. 2.4 billion events last week. Can’t scale vertically. So Citus DB.
users – customer id bigint, user id bigint, data jsonb. events – customer id foreign key, user id foreign key, event jsonb.
select count(*) from users where customer_id = 123 group by properties ->> 'ab_test_grp'
Complex queries with joins, group by, etc. done real-time via Citus DB. Citus DB parallelizes the queries among the individual postgres (shard) instances and aggregates them on the master node.
Making use of postgresql partial indexes (indexes on WHERE queries) when customer creates the query, for performance. This works well because data is sparse.
Make use of user-defined functions (UDFs), e.g. to analyze whether a user matches a funnel.
Where does data live before it gets into the Citus DB cluster? -> Use Kafka as a short-term commit log.
Kafka consumers make use of Postgres UDFs to make writes commutative and idempotent. Makes use of user exists checks, upserts, updates, etc.
Sharding by user, not time range. All shards written to all the time. How do we move shards, split shards, rehydrate new replicas, etc.? Use Kafka commit number to replicate the data and replay data after that commit number.
How real-time is it? Events are ingested within minutes.
Scott Milliken, founder of MixRank.
Low maintenance thanks to Postgresql, compared to war stories with newer big data solutions.
Vacuum can change query plans and cause regressions in production.
In low digit percentages of queries, cannot predict query planner, so try them all. Use CTEs (Common Table Expressions) to force different plans, race them, kill the losers. Ugly but surprisingly effective. Implemented generically using our own higher-level query planner. Why CTEs? Because they are an optimization boundary.
Use SQLAlchemy. We don’t use the ORM parts, we use it as a DSL on top of SQL. So dynamically introspect the queries and do permutations to generate the different plans. Don’t try to generate different query plans by hand, that will be hard to maintain. One way to do this is to query the
pg_class table to figure out which indexes are present, and generate permutations to use different indexes.
Comment from audience: You can write your own C module and override postgresql to use your own query planner.
Batch update, insert, delete queries are a great substitute for Hadoop (for us). But correct results can lag and performance can suffer.
pg_repack to run periodically, not
You can scale a single postgres pretty far, more than you think. We have 1 (good dedicated hardware) box with 3.7 GB/s. Performance on a good dedicated hardware over others is 10-100 times, i.e. 1-2 orders of magnitude.
Using lz4 encoding for ZFS compression results in 43% lesser data size.
Grant McAlister, Senior Principal Engineer, AWS RDS.
What’s new in storage:
Major version upgrade to Postgresql 9.4, uses
pg_upgrade. Recommendation: Test first with a copy instance. Will also help you figure out how much downtime to expect.
rds_superuser_reserved_connections to reserve connections for admin purposes.
pg_buffercache to determine working set memory size.
Use AWS Database Migration Service (DMS) to move data to same or different database engine. From customer premises to AWS RDS. 11 hours for a terabyte, depending on your network speed. At least version 9.4 for Postgresql because using logical decoding feature. In Preview release now.
Use AWS Schema Conversion Tool (SCT) to migrate stored procedures, etc.
Scale and Availability:
shared_buffers, if not load from pagecache/disk, if not load, load from EBS.
Burst mode: GP2 & T2
Cross-region replication is being planned. Currently, you can copy snapshots across regions.
Spacemacs is a new distribution of Emacs. Think what Ubuntu did for GNU/Linux – Spacemacs is doing the same for GNU Emacs. It combines all the existing great pieces and providing an easy-to-use good-looking package.
I used to use my own emacs configuration and then switched to Prelude for it’s neat Clojure integration because Bozhidar Batsov wrote CIDER (the Clojure-Emacs package) as well. This was mostly helpful when I was working at Helpshift.
What attracted me to Spacemacs was that it was initially based on evil-mode, a full vi emulation layer inside Emacs. This was great because I was indeed having an Emacs pinky problem. And then the sane key binding hierarchy combined with guide-key for visually seeing that hierarchy was icing on the cake.
I first tweeted questioning whether it’ll be difficult to integrate the rest of Emacs ecosystem and then took it as a challenge and added a layer for ERC (IRC package in Emacs), was impressed with the layer system of Spacemacs and I was hooked. Then, I added org-pomodoro, org-present, etc.
Elisp hacking has been fun.
Things that I’d like to see improved in Spacemacs:
~/.spacemacsfile which is already full of stuff and this confuses new users. New users expect to just copy/paste snippets and it should just work, they will not take the time to read a large config file. For example, there is a
layersconfig variable where users are supposed to add names of the layers they want to use, instead users should be able to copy/paste
(enable-layer 'org)instead and it can work equivalently.
>in evil-mode. Spacemacs needs to make up its mind on whether it’ll fully support a holy-mode vanilla Emacs key bindings and I hope it does.
developbranch moves fast and many early adopters are using that whereas newbies are using the
masterbranch, and there is often confusion in the chat room when someone asks for help. I wish there was a command in spacemacs that will generate the useful information as a text (which operating system, which emacs version, which layers are enabled, holy-mode or evil-mode, etc.) which can be pasted into the chat room and will assist others to offer advice much faster. Update: I contributed a change to make this happen, and happy to see it’s adoption, both in the chat as well as the default issue template.
It’s funny how was using XEmacs a decade ago, then dived fully into Vim (even wrote a book on it) and now I’m back into Emacs land.
On the same note, I am fascinated with newer editors like GitHub’s Atom which is gaining traction, also has a good package management system and UI using HTML/CSS which makes for easy extensibility – a hallmark of a great editor, and fascinating new possibilities such as integrating IPython/Jupyter in a Light-Table inspired way. My curiosity about Atom first piqued because Electron, the core of Atom editor is being used by the Slack desktop apps, Microsoft Visual Studio for Mac and Linux, etc.
I don’t know if / when / how I’ll make a switch to Atom, but until then I’m happy with Spacemacs.
I am going to try this out. did not not about it’s existence. I have aliased vim to emacs, just to break my nature.
I use IntellIiJ for packaged things, but vim, subl and emacs for one off. But like to use the powerful features of emacs more.
impressed (especially because of evil-mode). going to try it! @swaroopch thanks for sharing, and contributing!
Great post about spacemacs. Love the editor but I’m struggling with evil-lisp-state. How do you find CLJ develpment on it?
I should try Spacemacs sometime. @swaroopch writes about it … Didn’t know about evil mode before, cc: @shrayasr