Distributed Architectures
13.5K views | +0 today
Follow
Distributed Architectures
distributed architectures, big data, elasticsearch, hadoop, hive, cassandra, riak, redis, hazelcast, paxos, p2p, high scalability, distributed databases, among other things...
Curated by Nico
Your new post is loading...
Your new post is loading...
Scoop.it!

Cassandra error handling done right

Cassandra error handling done right | Distributed Architectures | Scoop.it

Proper error handling with databases is always a challenge when the safety of your data is involved. Cassandra is no exception to this rule. Thanks to the complete control of consistency, availability and durability offered by Cassandra, error handling turns out to be very flexible and, if done right, will allow you to extend the continuous availability of your cluster even further. But in order to reach this goal, it’s important to understand the various kind of errors that can be thrown by the drivers and how to handle them properly.


To remain practical, this article will refer directly to the DataStax Java Driver exceptions and API, but all the concepts explained here can be transposed directly to other DataStax Drivers.


Nico's insight:

One of the main behaviour to understand in a distributed database is how it will fail

No comment yet.
Scoop.it!

Metadata: Consistent snapshot analogies

Metadata: Consistent snapshot analogies | Distributed Architectures | Scoop.it

Last week I taught distributed snapshot in my CSE 586: Distributed Systems class. While I teach snapshot, I invariably find myself longing for analogies to provide some intuition about this concept. The global state captured by a distributed snapshot (say using Lamport/Chandy marker algorithm) does not correspond to the "state of the system at initiation of the snapshot". Furthermore, it also may not correspond to a "state of the system from initiation to current state during this computation". This is because while the snapshot taking is progressing in the system, the underlying system computation is also proceeding and changing the state of the system progressively. (Distributed snapshot is not allowed to stop/freeze underlying system computation as that reduces availability.)

No comment yet.
Scoop.it!

zimg

zimg | Distributed Architectures | Scoop.it

A lightweight and high performance image storage and processing system.

Nico's insight:

Probably interesting, but the most interesting part of the documentation is not in english... (http://zimg.buaa.us/documents/#design-and-architecture)

No comment yet.
Scoop.it!

CRATE | Your Elastic Data Store

CRATE | Your Elastic Data Store | Distributed Architectures | Scoop.it

Crate.IO has built a new breed of database to serve today’s mammoth data needs. Based on the familiar SQL syntax, Crate combines high availability, resiliency, and scalability in a distributed design that allows you to query mountains of data in realtime, not batches. We solve your data scaling problems and make administration a breeze. Easy to scale, simple to use.

Nico's insight:

Starting to hate these marketing site which doesn't give you a clue about how it works, furthermore when discovering that it is actually a wrapped elasticsearch when looking at the sources

No comment yet.
Scoop.it!

Data Replication in NoSQL Databases Explained | Planet Cassandra

Data Replication in NoSQL Databases Explained | Planet Cassandra | Distributed Architectures | Scoop.it

Data replication is the concept of having data, within a system, be geo-distributed; preferably through a non-interactive, reliable process. In traditional RDBMS databases, implementing any sort of replication is a struggle because these systems were not developed with horizontal scaling in mind. Instead, these systems can be backed up via a semi-manual process where live recovery wouldn’t be  much of an issue.   Even with live recovery not being much of an issue, it downplays the complexity of this setup. When dealing with today’s globally distributed data, the former colocated replication concepts will not suffice when implemented at geographic scale.


Today’s infrastructure requires systems that  natively support active and real-time replication, achieved through transparent and simple configurations. The ability to dictate where and how your data is replicated via easily tunable settings, along with providing users with easily understood concepts is what modern day NoSQL databases strive to offer.


Apache Cassandra, built with native multi data center replication in mind, is  one of the most overlooked because this level of infrastructure has been assimilated as “tribal knowledge” within the Cassandra community. For those new to Apache Cassandra, this page is meant to highlight the simple inner workings of how Cassandra excels in multi data center replication by simplifying the problem at a single-node level.


This page covers the fundamentals of Cassandra internals, multi-data center use cases, and a few caveats to keep in mind when expanding your cluster.


Nico's insight:

Great graphs and explanations of data replication works within Cassandra

No comment yet.
Scoop.it!

CQL Under the Hood

As a reformed CQL critic, I'd like to help dispel the myths around CQL and extol its awesomeness. Most criticism comes from people like me who were early Cassandra adopters and are concerned about the SQL-like syntax, the apparent lack of control, and the reliance on a defined schema. I'll pop open the hood, showing just how the various CQL constructs translate to the underlying storage layer--and in the process I hope to give novices and old-timers alike a reason to love CQL.

Nico's insight:

Having learned Cassandra's model via thrift, CQL was hard to understand until these slides

No comment yet.
Scoop.it!

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing

Abstract: Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related to Google's Internet advertising business. Mesa is designed to satisfy a complex and challenging set of user and systems requirements, including near real-time data ingestion and queryability, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes. Specifically, Mesa handles petabytes of data, processes millions of row updates per second, and serves billions of queries that fetch trillions of rows per day. Mesa is geo-replicated across multiple datacenters and provides consistent and repeatable query answers at low latency, even when an entire datacenter fails. This paper presents the Mesa system and reports the performance and scale that it achieves.

Nico's insight:

Seems Google have a such heavy load that they build a database for each need. And it starts to be hard to understand how they differ. Seems here that the trade-off if the "near" in "near real-time".

No comment yet.
Scoop.it!

Queues

Queues | Distributed Architectures | Scoop.it

There are many queueing systems out there. Each one of them is different and was created for solving certain problems. This page tries to collect the libraries that are widely popular and have a successful record of running on (big) production systems.


The goal is to create a quality list of queues with a collection of articles, blog posts, slides, and videos about them. After reading the linked articles, you should have a good idea about: the pros and cons of each queue, a basic understanding of how the queue works, and what each queue is trying to achieve. Basically, you should have all the information you need to decide which queue will best fit your needs.


No comment yet.
Scoop.it!

cockroachdb

cockroachdb | Distributed Architectures | Scoop.it
cockroach - A Scalable, Geo-Replicated, Transactional Datastore


Overview

Cockroach is a distributed key:value datastore which supports ACID transactional semantics and versioned values as first-class features. The primary design goal is global consistency and survivability, hence the name. Cockroach aims to tolerate disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention. Cockroach nodes are symmetric; a design goal is homogenous deployment (one binary) with minimal configuration.


Nico's insight:

As expected, an open source project inspired by Google's article about Spanner is born. To be monitored.


Nice project name BTW !

No comment yet.
Scoop.it!

Call me maybe: Elasticsearch

Call me maybe: Elasticsearch | Distributed Architectures | Scoop.it

Previously, on Jepsen, we saw RabbitMQ throw away a staggering volume of data. In this post, we’ll explore Elasticsearch’s behavior under various types of network failure.


Elasticsearch is a distributed search engine, built around Apache Lucene–a well-respected Java indexing library. Lucene handles the on-disk storage, indexing, and searching of documents, while ElasticSearch handles document updates, the API, and distribution. Documents are written to collections as free-form JSON; schemas can be overlaid onto collections to specify particular indexing strategies.


As with many distributed systems, Elasticsearch scales in two axes: sharding and replication. The document space is sharded–sliced up–into many disjoint chunks, and each chunk allocated to different nodes. Adding more nodes allows Elasticsearch to store a document space larger than any single node could handle, and offers quasilinear increases in throughput and capacity with additional nodes. For fault-tolerance, each shard is replicated to multiple nodes. If one node fails or becomes unavailable, another can take over. There are additional distinctions between nodes which can process writes, and those which are read-only copies–termed “data nodes”–but this is primarily a performance optimization.


Because index construction is a somewhat expensive process, Elasticsearch provides a faster, more strongly consistent database backed by a write-ahead log. Document creation, reads, updates, and deletes talk directly to this strongly-consistent database, which is asynchronously indexed into Lucene. Search queries lag behind the “true” state of Elasticsearch records, but should eventually catch up. One can force a flush of the transaction log to the index, ensuring changes written before the flush are made visible.


But this is Jepsen, where nothing works the way it’s supposed to. Let’s give this system’s core assumptions a good shake and see what falls out!

Nico's insight:

This article echoes the trouble we had with Elasticsearch partition tolerance. Our fix was to decrease the probability of a partition: running on high end servers.

No comment yet.
Scoop.it!

eskka - discovery plugin for elasticsearch

eskka aims at providing a robust Zen replacement. It most closely resembles Zen unicast discovery in that no external coordinator is required but you need to configure seed nodes.


It builds on top of Akka Cluster which uses a gossip protocol. It will help to read the spec.


We use a quorum of configured seed nodes for resolving partitions when a failure is detected (configurable thresholds) by 'downing' the affected node. A node that is downed in this manner will not be able to rejoin the cluster until restarted.


If any node (including the master) loses reachability with a quorum of seed nodes, it clears its internal elasticsearch cluster state. In case it becomes reachable again before it is downed, it will request a publish from current master to get the latest cluster state.


There is no master election per-se, it is deterministically the 'oldest' master-eligible member of the cluster.

Nico's insight:

And it passes a Jespen test

No comment yet.
Scoop.it!

Redis alongside Memcached ?

Redis alongside Memcached ? | Distributed Architectures | Scoop.it

If redis is already a part of the stack, why is Memcached still used alongside Redis?


Redis can do everything that memcached provides ( LRU cache, item expiry, and now clustering as well,(version 3.x or more, currently in beta) or by tools like twemproxy. The performance is similar too . Morever, Redis adds persistence due to which you need not do cache warming in case of a server restart.


Inspite of this, on studying stacks of large webscale companies like Instagram,Pinterest,Twitter etc, I found that they use both Memcached and Redis for different purposes, not using Redis for primary caching. The primary cache is still Memcached, and Redis is used for its data structures based logical caching.


My question is , as of 2014, why is memcached still worth the pain to be added as additional component into your stack, when you already have a Redis component which can do everything that memcached can ? What are the favorable points that incline the architects/engineers to still include memcached apart from already existing redis ?


-----

Answered by Salvatore Sanfilippo (aka antirez)

-----



The main reason I see today as an use-case for memcached over Redis is the superior memory efficiency you should be able to get with plain HTML fragments caching (or similar applications). If you need to store different fields of your objects in different memcached keys, then Redis hashes are going to be more memory efficient, but when you have a large number of key -> simple_string pairs, memcached should be able to give you more items per megabyte.



Other things which are good points about memcached:


  • It is a very simple piece of code, so if you just need the functionality it provides, it is a reasonable alternative I guess, but I never used it in production.
  • It is multi-threaded, so if you need to scale in a single-box setup, it is a good thing and you need to talk with just one instance.


I believe that Redis as a cache makes more and more sense as people move towards intelligent caching or when they try to preserve structure of the cached data via Redis data structures.


[...]

Nico's insight:

Excellent detailed explanation of the difference between two quite similar memory stores, with talkative use case

No comment yet.
Scoop.it!

Guide to Cassandra Thread Pools

Guide to Cassandra Thread Pools | Distributed Architectures | Scoop.it

This guide provides a description of the different thread pools and how to monitor them. Includes what to alert on, common issues and solutions.

Nico's insight:

Great details of the SEDA architecture of Cassandra

Larry Korb's curator insight, March 21, 2016 2:09 PM

Great details of the SEDA architecture of Cassandra

Scoop.it!

Using Presto in Netflix's Big Data Platform on AWS

Using Presto in Netflix's Big Data Platform on AWS | Distributed Architectures | Scoop.it
At Netflix, the Big Data Platform team is responsible for building a reliable data analytics platform shared across the whole company. In general, Netflix product decisions are very data driven. So we play a big role in helping different teams to gain product and consumer insights from a multi-petabyte scale data warehouse (DW). Their use cases range from analyzing A/B tests results to analyzing user streaming experience to training data models for our recommendation algorithms.

We shared our overall architecture in a previous blog post. The underpinning of our big data platform is that we leverage AWS S3 for our DW. This architecture allows us to separate compute and storage layers. It allows multiple clusters to share the same data on S3 and clusters can be long-running and yet transient (for flexibility). Our users typically write Pig or Hive jobs for ETL and data analytics.


A small subset of the ETL output and some aggregated data is transferred to Teradata for interactive querying and reporting. On the other hand, we also have the need to do low latency interactive data exploration on our broader data set on S3. These are the use cases that Presto serves exceptionally well. Seven months ago, we first deployed Presto into production and it is now an integral part of our data ecosystem. In this blog post, we would like to share our experience with Presto and how we made it work for us!
No comment yet.
Scoop.it!

Redis cluster, no longer vaporware - Antirez weblog

Redis cluster, no longer vaporware - Antirez weblog | Distributed Architectures | Scoop.it
The first commit I can find in my git history about Redis Cluster is dated March 29 2011, but it is a “copy and commit” merge: the history of the cluster branch was destroyed since it was a total mess of work-in-progress commits, just to shape the initial idea of API and interactions with the rest of the system.Basically it is a roughly 4 years old project. This is about two thirds the whole history of the Redis project. Yet, it is only today, that I’m releasing a Release Candidate, the first one, of Redis 3.0.0, which is the first version with Cluster support.
Nico's insight:

An article saluted by the community for its clear statements about the chosen trade-offs on availability, consistency and partition tolerance.

No comment yet.
Scoop.it!

Distributed Rate Limiting With Redis

Rate limiting is essential in many applications, whenever access to an expensive resource needs to be restricted. With many modern webapps running on multiple processes and servers, state has to be shared. An ideal solution would be efficient, fast, and not rely on individual app servers being tied to a certain client (due to load balancing) or holding any state themselves.

Nico's insight:

Good practical exemple of a solution of a distributed feature.

No comment yet.
Scoop.it!

Redis Sentinel at Flickr

Redis Sentinel at Flickr | Distributed Architectures | Scoop.it

We recently implemented Redis Sentinel at Flickr to provide automated Redis master failover for an important subsystem and we wanted to share our experience with it. Hopefully, we can provide insight into our experience adopting this relatively new technology and some of the nuances we encountered getting it up and running. Although we try to provide a basic explanation of what Sentinel is and how it works, anyone who is new to Redis or Sentinel should start with the excellent Redis and Sentinel documentation.


At Flickr we use an offline task processing system that allows us to execute heavyweight operations asynchronously from our API and web pages. This prevents these operations from making users wait needlessly for pages to render or API methods to return. Our task system handles millions of tasks per day which includes operations like photo uploads, user notifications and metadata edits. In this system, code can push a task onto one of several Redis-backed queues based on priority and operation, then forget about the task. Many of these operations are critical and we need to make sure we process at least 99.9999% of them (less than 1 in 1 million dropped). Additionally, we need to make sure this system is available to insert and process tasks at least 99.995% of the time – no more than about 2 minutes a month downtime.

Nico's insight:

Really nice detailed article about putting Redis Sentinel in prod, including expected failure and data loss scenario

No comment yet.
Scoop.it!

Why local state is a fundamental primitive in stream processing

Why local state is a fundamental primitive in stream processing | Distributed Architectures | Scoop.it

One of the concepts that has proven the hardest to explain to people when I talk about Samza is the idea of fault-tolerant local state for stream processing. I think people are so used to the idea of keeping all their data in remote databases that any departure from that seems unusual.


So, I wanted to give a little bit more motivation as to why we think local state is a fundamental primitive in stream processing.

Nico's insight:

Even if I don't agree with the use of "fundamental" in the title, this article relates well practical choices to do when processing streams.

No comment yet.
Scoop.it!

A Look at Nanomsg and Scalability Protocols (Why ZeroMQ Shouldn’t Be Your First Choice)

A Look at Nanomsg and Scalability Protocols (Why ZeroMQ Shouldn’t Be Your First Choice) | Distributed Architectures | Scoop.it

Earlier this month, I explored ZeroMQ and how it proves to be a promising solution for building fast, high-throughput, and scalable distributed systems. Despite lending itself quite well to these types of problems, ZeroMQ is not without its flaws. Its creators have attempted to rectify many of these shortcomings through spiritual successors Crossroads I/O and nanomsg.


The now-defunct Crossroads I/O is a proper fork of ZeroMQ with the true intention being to build a viable commercial ecosystem around it. Nanomsg, however, is a reimagining of ZeroMQ—a complete rewrite in C1. It builds upon ZeroMQ's rock-solid performance characteristics while providing several vital improvements, both internal and external. It also attempts to address many of the strange behaviors that ZeroMQ can often exhibit. Today, I'll take a look at what differentiates nanomsg from its predecessor and implement a use case for it in the form of service discovery.

Nico's insight:

ZeroMQ, nanomsg and friends seems useful pieces to build the network layer of any distributed software

No comment yet.
Scoop.it!

Powering big data at Pinterest

Powering big data at Pinterest | Distributed Architectures | Scoop.it

Big data plays a big role at Pinterest. With more than 30 billion Pins in the system, we’re building the most comprehensive collection of interests online. One of the challenges associated with building a personalized discovery engine is scaling our data infrastructure to traverse the interest graph to extract context and intent for each Pin.


We currently log 20 terabytes of new data each day, and have around 10 petabytes of data in S3. We use Hadoop to process this data, which enables us to put the most relevant and recent content in front of Pinners through features such as Related Pins, Guided Search, and image processing. It also powers thousands of daily metrics and allows us to put every user-facing change through rigorous experimentation and analysis.


In order to build big data applications quickly, we’ve evolved our single cluster Hadoop infrastructure into a ubiquitous self-serving platform.


Nico's insight:

Interesting to see that Pinterest architecture is similar to ours. At Scoop.it we don't have that much data, but we have the same challenge.

Should I conclude that quantity doesn't matter, only quality ?

No comment yet.
Scoop.it!

Managing the Internet Enterprise Database with OpsCenter 5.0

Managing the Internet Enterprise Database with OpsCenter 5.0 | Distributed Architectures | Scoop.it
We’re pleased to announce the immediate availability of DataStax OpsCenter 5.0, our visual management and monitoring solution for Apache Cassandra and DataStax Enterprise (DSE). With version 5.0, DataStax OpsCenter vividly demonstrates how it has become a serious enterprise-class database tool capable of managing and monitoring the types of distributed database deployments found in modern Internet enterprise applications.
Nico's insight:

This is starting to look like a real Cassandra monitoring system rather than a graphing tool.

No comment yet.
Scoop.it!

Stuff you expect CQL to do...

cqlsh:stats> CREATE TABLE userstats (     username varchar,     countername varchar,     value counter,     PRIMARY KEY (username, countername) ) WITH comment='User Stats' and compact storage;
cqlsh:stats> insert into userstats (username,countername, value) values ('ed','a',1);
Bad Request: INSERT statement are not allowed on counter tables, use UPDATE instead
cqlsh:stats> update userstats (username,countername, value) values ('ed','a',1);
Bad Request: [...]

Nico's insight:

Cassandra model is so distinct and restrictive compared to a classic relational model that I always had doubt about the suitability of a SQL like language for Cassandra. Here is the proof.

No comment yet.
Scoop.it!

Call me maybe: RabbitMQ

Call me maybe: RabbitMQ | Distributed Architectures | Scoop.it

RabbitMQ is a distributed message queue, and is probably the most popular open-source implementation of the AMQP messaging protocol. It supports a wealth of durability, routing, and fanout strategies, and combines excellent documentation with well-designed protocol extensions. I’d like to set all these wonderful properties aside for a few minutes, however, to talk about using your queue as a lock service.

No comment yet.
Scoop.it!

Scaling Feature Flags With Zookeeper

"I love feature flags. They’re one of those tools that make running complicated systems much more manageable. For those not in the know, feature flags are a way of adding a conditional to your code that lets a configurable number of users or requests through, originally designed for restricting new features to internal users before rolling them out to the rest of the userbase. Yeller uses its own clojure library for feature flags: shoutout, which is mostly just a clojure port of James Golick’s rollout."

Nico's insight:

"Feature flag" is kind of the trendy and narrow use of a dynamic configuration manager. But the technology is basically the same, it's all about the choice of the data store.

No comment yet.