scikit-learn random state in splitting dataset

Random_state as the name suggests, is used for initializing the internal random number generator, which will decide the splitting of data into train and test indices in your case.

This is to check and validate the data when running the code multiple times. Setting random_state a fixed value will guarantee that same sequence of random numbers are generated each time you run the code. And unless there is some other randomness present in the process, the results produced will be same as always. This helps in verifying the output.

Comments

Machine Learning

Machine learning (ML) is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks. Machine learning is closely related to computational statistics, which focuses on making predictions using computers. The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a related field of study, focusing on exploratory data analysis through unsupervised learning. In its application across business problems, machine learning is also refer...

What is difference between "inplace = True" and "inplace = False?

Both inplace= true and inplace = False are used to do some operation on the data but: When inplace = True is used, it performs operation on data and nothing is returned. df.some_operation(inplace=True) When inplace=False is used, it performs operation on data and returns a new copy of data. df = df.an_operation(inplace=False)

Differences between Hadoop and Spark?

In fact, the key difference between Hadoop MapReduce and Spark lies in the approach to processing: Spark can do it in-memory, while Hadoop MapReduce has to read from and write to a disk. As a result, the speed of processing differs significantly – Spark may be up to 100 times faster.

Data Bizx

Search This Blog