Kubernetes’ extraordinary resilience tends to change the emphasis of monitoring from alerting to resource and performance management.
You are here
Event Diary: Percona Live, Santa Clara CA
Open source databases are the future. Peter Zaitsev, founder of Percona, made this point succinctly at the beginning of his opening keynote at Percona Live, the morning of April 23, in Santa Clara, California (USA).
By way of documentation, Zaitsev offered data compiled by the site db-engines.com, including a global ranking of DBMSs, showing open source DBs MySQL, PostgreSQL, and MongoDB in the #2, #4, and #5 spots, with Oracle DB still predictably in #1, and Microsoft SQL Server in #3. While commercial products still ostensibly dominate the SQL-driven relational database market (61.9% vs 38.1%), the balance is clearly shifting in the direction of open source solutions. Meanwhile, open source dominates in fast-growing, specialized realms like wide-column (e.g., Apache’s Cassandra and HBase (modeled on Google BigTable)), time series (e.g., InfluxDB), document (e.g., MongoDB, Apache CouchDB), and architecturally-diverse key-value stores (Redis, etcd, etc.).
This is fine with Percona, who are navigating the waters of open source “coopetition” with a pure-play strategy that respects customer choice and works hard against lock-in. Percona makes deservedly-popular, fully open-source, freely-downloadable, distributions of MySQL and MongoDB that work as drop-in replacements for upstream community releases and as alternatives to ‘Enterprise’ spins. They support their own distros commercially, and also provide support for pretty much any other MySQL variant (e.g., MariaDB, founded on the legacy of original MySQL). And as the big RDBMSs are all now multi-model (i.e., in addition to databases comprising classic relational tables accessed with SQL queries, they offer JSON-oriented document, key-value, geospatial, graphic, and other storage models and query methods) that means Percona is fast acquiring authoritative expertise in all these specialized areas as well.
Percona Live, their twice-yearly (held in the US in April and in Europe in November) event, now reflects this “big tent” vision. Formerly focused almost exclusively on MySQL, Percona Live has expanded to provide a forum for all open source DB makers, community members and users, plus an expanding universe of ecosystem players (including Opsview) who provide database monitoring and other related services and tools. The show offers three, packed days of tutorials, keynotes, a host of special-interest tracks around major DB types, centers of effort like clouds and containers, and even including a new China track that saw presentations by engineering teams from Alibaba and other China-based centers of innovation.
Opsview’s Head of Innovation, Bill Bauman, and I went to Percona Live in part to speak about and demo our recent laboratory work using Opsview Monitor to gain insight on a container-oriented data center featuring Kubernetes, the OpenFaaS serverless computing platform, and Percona MySQL (see below). We’ll cover that work in detail in upcoming blogs and videos. In this blog, we’ll try to give a small taste of what it was like to attend Percona Live, including some of the amazing folks we met, new products we learned about, and important industry trends. Note: in coming weeks, Percona will release full video and slides of Percona Live 2018 Santa Clara tutorials and sessions -- so follow them on Twitter (@Percona) and keep checking the Percona Live site for updates.
MySQL 8.0 goes GA
On April 19, with much fanfare (and just four days before Percona Live opened) MySQL version 8.0 entered general availability. Tomas Ulin, VP of MySQL Development at Oracle, summarized the most important new features at a “State of the Dolphin” keynote on Tuesday morning, April 24.
Bill and I had met Tomas the evening before, at the Percona Live opening reception. We’d been surprised to learn that despite version 8.0 being the biggest-ever MySQL release in terms of new features, new tests (over 500), bugs fixed (5000+!) and several other metrics, Ulin felt no need and was under no pressure to aggregate his workforce in one physical location. On the contrary, work on MySQL was being successfully distributed to teams in Norway, in India and around the world. His major concern wasn’t at all about communication and workflow, but rather about ensuring that far-flung team members felt appreciated by customers for the important work they were doing. (Our suggestion to you: join the MySQL community on Slack and let them know you care! Also check out the blogs by Frederic Descamps, MySQL Evangelist at Oracle in Belgium.)
Tomas opened his keynote by describing MySQL 8.0’s new noSQL document-oriented database functionality, which supports a Create/Read/Update/Delete (CRUD) API and connectors for a wide range of languages and development environments (e.g., Node.js, Python, C/C++, C#, etc.). The document DB features of MySQL 8.0 can be used in a completely noSQL, schema-less mode or developers can extend conventional relational table DBs with freeform elements; mixing SQL and noSQL queries on the hybrid database’s different parts. A JSON_TABLE function enables flattening of documents into tables for SQL searches. Upcoming MySQL 8.0+ point releases will include support for nested JSON objects that can also be flattened with the JSON_TABLE function.
MySQL going noSQL (and then effectively hybridizing noSQL and SQL) reflects the industry’s continuing trend towards multi-model support. The goal of multi-model is to reduce the operational overhead and application-layer complexity of maintaining, synchronizing, and accessing multiple database platforms to support applications needing more than one type of database functionality. Providing it, however, isn’t simple, and there are delicate trade-offs to be made, both in architecting multi-model DB platforms, and in choosing and implementing multi-model solutions over more established, model-dedicated DBMS choices.
Ulin then discussed MySQL 8.0 Common Table Expressions (CTEs) and Window Functions. CTEs are a long-awaited feature (present since around 2005 in MS SQL Server and since 2016 in MariaDB) which enable creation of temporary named result sets which can then be referenced with SELECT, INSERT, UPDATE, or DELETE. CTEs are great for simplifying complex queries (eliminating derived tables from query bodies, for example). They can stand in for ‘views.’ And they enable use of recursion and self-reference -- powerful tools when working with hierarchical data. Window functions, meanwhile, let you bracket subsets of rows and perform operations like SUM() and COUNT() on them in vector fashion, useful for creating partial sums, moving averages, and performing other analytical tasks.
MySQL 8.0 provides quite a few new features to improve resilience, transactionality, and enable greater speed and/or control in querying high-traffic databases. A new data dictionary architecture stores metadata in InnoDB tables instead of out in the file system, protecting it from crashes. New NOWAIT and SKIP LOCKED directives let you tell a query to error immediately if it encounters a locked row, or ignore locked rows in seeking results.
MySQL 8.0 Geospatial functionality is now entirely provided by the open source Boost library set, harnessing the work of hundreds of geospatial and geometry experts to provide robust, fast, high quality geospatial processing, with great support for edge cases.
A number of convenience features have been added to make life easier for DBAs, including the ability to persist global configuration changes of the sort frequently performed when tuning for improved performance (e.g., changes to max_connections) across restarts. This new functionality is made safer by creating an audit trail showing who made configuration changes and when.
These are just a few of the dozens of new features and improvements made to MySQL in release 8.0. To see everything, check out their white paper.
Last year, Oracle introduced its autonomous database platform: a cloud-based spin on Oracle 18c that maintains and optimizes itself via a combination of heuristics and machine learning. At Percona Live, several speakers described and demonstrated work in similar directions: the goal being to create methodologies whereby databases can manage and optimize themselves, and, ultimately, apply these methods to create self-managing database products and services.
While the idea of using analytical utilities to recommend DB optimizations is well-established, these experiments take the idea several steps further, into true autonomy: databases (or utilities driving databases) that adapt dynamically to changing workloads and conditions too volatile and complex for human operators to master.
Nikolay Samokhvalov, of postgresql.support, based in Campbell, CA, pointed users towards the beta of a trio of AI-based optimization services for database tuning, query optimization, and database resource planning.
Andy Pavlo, a professor at Carnegie Mellon, offered a presentation with the catchy name of Make Your Database Dream of Electric Sheep: Designing for Autonomous Operation, wherein he described criteria for self-driving databases, creation of a database tuning service (called OtterTune) and a self-driving database system called Peloton. In very general terms, the system works by clustering workloads to create a model of overlapping and distinct requisites, then using TensorFlow to generate predictions from the model, capturing these in the form of physical, data, and execution optimizations that are then further assessed using a moving-window method of finding optima called Receding Horizon Control. Until video of his session becomes available (and even after) it’s worth reading the paper Pavlo and colleagues published last year about this research. Pavlo is brilliant, and also an extremely engaging and funny speaker: his students are lucky ducks.
SQL and Time-Series Hybridization: TimescaleDB
On the show floor, we met Ajay Kulkarni, co-founder and CEO of New York-based TimescaleDB -- a really interesting open source product that extends PostgreSQL into a very high performance, single-node (with clusters on the way) time series database. They just raised $16m from Benchmark, NEA and Two Sigma Ventures and are rapidly building out a global development and services arm.
TimescaleDB takes advantage of the fact that time-series data is essentially immutable post-ingestion, and that storing it involves writes (normally to a recent time interval) rather than updates. Timescale a table-of-tables data structure (called a ‘hypertable’ in Timescale-speak) broken up into chunks (indexed tables) representing a unit of time and a primary key -- keeping ‘recent’ chunks in memory to enable rapid writes and avoid swaps to disk. A blog explaining how it works can be found here.
Apparently this works really well, as in routine insert rates (on cloud VMs) of 1-2 million metric inserts per second and database sizes of 10+ billion rows, with much larger DB sizes and higher insert rates possible on dedicated hardware with RAID support. In addition to performance, users get the flexibility and power of complete standard SQL (plus enhanced time-series query functions) for composing rich queries; the ease of administering what remains, objectively, a Postgres database; and full compatibility with Postgres utilities and toolkits. See this FAQ for more.
Kubernetes, Serverless, and Percona
As noted above, Bill and I were privileged to present the results of research we’ve been doing on the OpenFaaS platform: an open source, premises-based serverless computing framework that runs on container orchestrators Docker Swarm and Kubernetes. Our project set up OpenFaaS on a small, production-like multi-node Kubernetes cluster with Weave networking, itself deployed on Linux VMs running on semi-dedicated hardware. We composed Python 3 functions to profile how OpenFaaS stresses its host stack while executing and self-scaling. We performed some experiments using functions to store and query a Percona MySQL 5.7 database; looking to gain some clarity on how FaaS systems potentially interact with standard database architectures. We also explored how this aggregation of resources could best be monitored: using Opsview Monitor 6.0 (Technical Preview) to monitor underlying hardware and Linux performance, Docker engines, Kubernetes services (using a beta Kubernetes OpsPack), and the Percona DB (using Opsview’s excellent MySQL OpsPack). We’ll be posting an edited video version of our presentation shortly, along with OpenFaaS function code snippets, and Ansible playbooks enabling Kubernetes+OpenFaaS deployment. Stay tuned!
More like this
So, last Friday night, I decided to turn my infrastructure into code by learning Ansible, and capture the entire demo configuration, so that, in...
John Jainschigg attended InfluxDays NYC to learn more about this cutting edge time-series database. Although it was a hugely valuable day, with a...