Feeds:
Posts
Comments

Posts Tagged ‘statistics’

NewImageDefinition: “Extremely scalable analytics – analyzing petabytes of structured and unstructured data at high velocity.”

Definition: “Big data is data that exceeds the processing capacity of conventional data base systems.”

Big Data has three characteristics:

Variety – Structured and unstructured data

Velocity – Time sensitive data that should be used simultaneously with its enterprise data counterparts, in order to maximize value

Volume – Size of data exceeds the nominal storage capacity of the enterprise.

NewImageStatistics:

– In 2011, the global output of data was estimated to be 1.8 zettabytes (10^21 bytes)

– 90% of the world data has been created in the last 2 years.

– We create 2.5 quintillion (10^18) bytes of data per day (from sensors, social media posts, digital pictures, etc.)

– The digital world will increase in capacity 44 folds between 2009 and 2020.

– Only 5% of data is being created in structured forms, 95% is largely unstructured.

– 80% of the effort involved in dealing with unstructured data is reconditioning ill-formed data to well-formed data (cleaning it up).

Performance Statistics (I will start tracking more closely):

– Traditional data storage costs approximately $5/GB, but storing the same data using Hadoop only cost $0.25/GB – yep 25cents/GB. Hum!

– FaceBook stores more than 20Petabytes of data across 23,000 cores, with 50Terabytes of raw data being generated per day.

– eBay uses over 2,600 clustered Hadoop servers.

Read Full Post »

NewImageHere are some interesting statistics from IDC:

>> An organization employing 1,000 knowledge workers loses $5.7 million annually justin in time wasted having to reformat information as they move between applications.

>> The same organization will loose another %5.3 million from not finding information that is in the organization

>> Over 95% of the digital universe is “unstructured data,” that is, content that can not be represented by its field in a record (e.g., name, address, date, etc.).

>> In most organizations, unstructured data accounts for more than 80% of all information.

 

An
or
ga
niz
a
tion
em
pl
oying
1
,
000
kno
w
led
ge
w
orker
s
los
e
s
$
5
.
7
mi
llion
a
nnu
a
lly
j
u
s
t
in
tim
e
w
a
sted
havin
g
to
reform
a
t
inform
a
tion
as
the
y
mov
e
a
m
on
g
a
pp
lica
tion
s
.
No
t
find
ing
inf
orm
a
tion
cos
t
s
that
s
a
m
e
org
a
niz
a
tion
an
a
d
d
itional
$
5
.
3
m
a
y
ea
r
.

Read Full Post »