Skip to main content

Data Science foundation

Data science is an interdisciplinary field focused on extracting knowledge from data sets, which are typically large. The field encompasses analysis, preparing data for analysis, and presenting findings to inform high-level decisions in an organization. 

As such, it incorporates skills from computer science, mathematics, statistics, information visualization, graphic design, complex systems, communication and business. Statistician Nathan Yau, drawing on Ben Fry, also links data science to human-computer interaction: users should be able to intuitively control and explore data. 

In 2015, the American Statistical Association identified database management, statistics and machine learning, and distributed and parallel systems as the three emerging foundational professional communities

Comments

Popular posts from this blog

What is difference between "inplace = True" and "inplace = False?

Both inplace= true and inplace = False are used to do some operation on the data but: When  inplace = True  is used, it performs operation on data and nothing is returned. df.some_operation(inplace=True) When  inplace=False  is used, it performs operation on data and returns a new copy of data. df = df.an_operation(inplace=False)

Levenshtein distance

In information theory, linguistics and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. It is named after the Soviet mathematician Vladimir Levenshtein, who considered this distance in 1965. Levenshtein distance may also be referred to as edit distance, although that term may also denote a larger family of distance metrics known collectively as edit distance. It is closely related to pairwise string alignments.

Differences between Hadoop and Spark?

In fact, the key  difference between Hadoop  MapReduce and  Spark  lies in the approach to processing:  Spark  can do it in-memory, while  Hadoop  MapReduce has to read from and write to a disk. As a result, the speed of processing differs significantly –  Spark  may be up to 100 times faster.