After winning the SBUG drawing for the QCON ticket, I attended the conference in London on March 11 – 13, 2009. The conference covered a wide range of technologies and topics, from designing web architectures, to detailed sessions on Java and .Net development, to running Agile Scrum projects with distributed teams. A full schedule of the conference can be found at http://qconlondon.com/london-2009/schedule, including links to many of the actual presentations (some are password-protected – use qconlondon2009 / seeyounextyear for the user / password, respectively).
Here are my notes and thoughts on the sessions I found particularly interesting:
Web Oriented Architectures
This was an interesting talk, which focused on defining how Web 2.0 architectures are important to companies offering services over the internet.
Point 1 – architectures have become much more complex over time, evolving from stand-alone apps to internet-based apps based on SOA.
Point 2 – The central goals of SOA are: turning apps into open platforms by openly exposing features of software and data to customer. Achieving this goal allows SOA’s to leverage the “network effect” – a theory central to Web 2.0, claiming that the value of a service increases as the numbers of its users increase (e.g. social networking sites).
Point 3 – Web Oriented Architectures (WOA) aim to create new competitive advantages online, by exposing SOA services and facilitating their use / adoption (open API’s, developer frameworks, etc...).
Point 4 – REST is the predominant technology for implementing WOA, as it facilitates easy consumption of services and leverages benefits of HTTP (scalability, caching, stateless communication, etc...). SOAP has not been used widely in WOA’s.
One thing I think it’s important to clarify here is the inherent difference between services offered over the internet, and those services a company only exposes internally – I don’t believe the same rules / best practices necessarily apply in both cases. For example, REST’s easy consumption may be good for externally exposed services, but the more explicit transactional and security mechanisms built into SOAP / WS-* standards may be necessary for internally consumed services.
Cloud Data Persistence
This session covered general approaches and benefits of cloud data persistence, some of the different cloud data persistence providers, and the differences between their offers.
General Details – Cloud Data Persistence
- Internet-scale applications require a different approach to storing data, due to size of data sets
- Cloud data persistence is optimized for large data sets, but at a cost (fewer capabilities around joins, less control over transactions, RDBMS-style features)
Google BigTable
- Distributed data store built by Google
- Can hold hundreds of terabytes across thousands of machines
- Effectively a big, distributed sorted map
- Each row has a key, and is a set of columns grouped into families (can have multiple values per column)
- Data is versioned, transactional and consistent
- Some hard problems “outsourced” (GFS - Google File System, Chubby - distributed lock manager, data tables sharded into tablets)
- Similar open source options: Hypertable, Apache HBase
Amazon Dynamo
- Distributed data store built by Amazon
- Key-value store
- Designed for availability: tolerate network partitions and server failures, non-transactional and "eventually consistent"
- Used for things like shopping cart and sessions management
- Combines several techniques: decentralized arch - no master, data is partitioned and replicated via consistent hashing, multi-node reads and writes for redundancy, objects are versioned for consistency
- Vector clocks are used to disambiguate between several versions of the same object: a list of (node, counter) tuples, each object instance has a vector clock, when object is written the vector clock on each node is updated, implications for the API (must specify vector clock version), if you read a value with two different versions, Dynamo can't figure it out - you have to!
- Availability requirements push consistency obligations onto the programmer API
Plate-Spinning on EC2
- Cloud from the IaaS perspective
- Multiple VM's of MySQL - one master, many slaves
- When master instances goes away, designate a new master from remaining, and add another instance
Amazon SimpleDB
- A tabular data store
- "Domains" (like tables) contain "items" (like rows)
- Specifics: auto-indexing, eventually consistent, no cross-domain joins, query limited to 250 items, everything is a string (number issues / date issues), all comparisons are lexicographical
- Exposes a REST-ful-ish API
Azure Services Platform
This large topic spanned two sessions, taking up the entire morning of the last day of the conference. Beat Schwegler discussed the core components of Azure: Web Roles, Worker Roles, and Storage (tables, blobs, and queues) – along with Azure development and deployment methods. See the slides posted on the QCON site for more details.
While the details of Azure have been covered in many sessions since PDC 2008, Beat discussed a couple of things I had not heard about:
Programming Models for the Cloud
These are based on Eric Brewer’s CAP theorem – in which you can only ensure two of the following things in a large-scale (internet-scale) software program:
1. Consistency
2. Availability
3. Tolerance to Network Partitions
This limitation leads to the BASE semantic (as opposed to the ACID semantic):
1. Basic Availability
2. Soft-State
3. Eventually Consistent
This is an important shift, requiring us to re-think the design of cloud-hosted application and services. The presentation included a table that I found particularly helpful and defining the difference between non-cloud and cloud based programming models:
|
Where did we start (non cloud)?
|
Where did we end up (cloud)?
|
|
Shared state
|
Partitioned / replicated state
|
|
ACID Tx's
|
Eventual consistency
|
|
Exactly once messaging
|
Best effort messaging
|
|
Machine loss BAD
|
Machine loss is business as usual
|
|
Keep processes running
|
Recovery-oriented computing
|