Skip to main content

Titanic story and dataset

 


The sinking of the Titanic is one of the most infamous shipwrecks in history.

On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.

While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

Please download the dataset from here.

Comments

Popular posts from this blog

What is difference between "inplace = True" and "inplace = False?

Both inplace= true and inplace = False are used to do some operation on the data but: When  inplace = True  is used, it performs operation on data and nothing is returned. df.some_operation(inplace=True) When  inplace=False  is used, it performs operation on data and returns a new copy of data. df = df.an_operation(inplace=False)

Levenshtein distance

In information theory, linguistics and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. It is named after the Soviet mathematician Vladimir Levenshtein, who considered this distance in 1965. Levenshtein distance may also be referred to as edit distance, although that term may also denote a larger family of distance metrics known collectively as edit distance. It is closely related to pairwise string alignments.

Differences between Hadoop and Spark?

In fact, the key  difference between Hadoop  MapReduce and  Spark  lies in the approach to processing:  Spark  can do it in-memory, while  Hadoop  MapReduce has to read from and write to a disk. As a result, the speed of processing differs significantly –  Spark  may be up to 100 times faster.