Sources for inspiration: Data Visualization

Posted on Updated on

Visualizations:

Wind Map

Word Cloud

Presentation: 

Prezi

Brainshark

Tools:

Colorpic – Tool

Graphjam

Blogs:

Storytellingwithdata

Dataplusscience

Chandoo

Andy Kriebel

Kaiser Fung

Miscellaneous: 

Flare

D3.js, ProcessingjsHighcharts, AmCharts, Raphaël.js

R Shiny

Visualization Toolkit

MayaVi (Python)

Geocoding:

findlatitudeandlongitude

Guidelines: 

Choosing the Right Chart

Stephen Few on Data Visualization

The People: 

Stephen Few

Mike Bostock and Jeffrey Heer from the Stanford Visualization Group

Moritz Stefaner – OECD

Amanda Cox

Ben Shneiderman

Jon Nelson

Nicholas Felton

Data Preparation: 

Data Wrangler for data cleaning and tranformation

Google Refine

ASAP Utilities for Excel

Extract data from Web (import.io)

Video:

Journalism in the Age of Data

Readings:

The Data Deluge

Advertisements

Application using MongoDB with FreeMarker and Spark

Posted on Updated on

We will use the FreeMarker template engine to create the template mappings. Also, to keep the web application simple using Java code alone, Spark Web Application Framework is used.

Source (Git)

MongoDB Java Application

MongoDB Java Application 2

Querying the Mongodb and checking the value (mongo) which has been displayed using the Java application created.

> db.hello.findOne()

{ "_id" : ObjectId("52ce3490c85b1177b58ec638"), "name" : "mongo" }

Simply Hadoop!

Posted on Updated on

One of the important channels businesses use to provide their services are their websites and applications. However, as these businesses expand their portfolio of services, there is a need for these applications to support an increasing customer base. From an IT infrastructure standpoint, this scenario poses two major challenges.

First, there is need to add more web servers/application servers to be able to support more users accessing these websites. And second, the databases have to be scaled to handle more data.

When we specifically look at handling the increasing data, we have to introduce additional databases and integrate them with the existing ones. Towards scaling and maintaining the databases, some of the challenges to be addressed are achieving synchronization, reliability (through replication) and managing hardware failures etc. At the same time, these databases need to be equally efficient by having low latency, good performance, and providing high availability.

A framework created to address these issues is the Hadoop project. In simplistic terms, Hadoop is a framework that facilitates storing and analyzing of large data sets across clusters of commodity hardware which can scale depending on the needs of the business.