In this article, we discuss a state of the art NLP pipeline that enables the grouping of randomly selected articles from www.amazon.com into relevant topics. We use webhose.io for data ingestion, IBM Watson developer cloud for named entity recognition, MongoDB for storage and a Flask app to display the results.
The following diagram depicts the various components of the NLP pipeline and their inter-connections.
As seen in the above figure, the dataset which is a collection of article... read more...
We have seen K-Means algorithm using R & Python before, now let me explain very basic classification algorithm with R. Briefly, I will introduce to K - Nearest Neighbour concept with various steps involved in it and how to implement those steps in R.
K — NN algorithm:
K-NN is one of the simplest supervised learning algorithm based on similarity function. In KNN there will be a target categorical variable which is partitioned into pre-determined classes/categories. The procedure follows... read more...
We have seen Gradient decent using R & Python before now let us try a clustering algorithm with R & Python. I will briefly introduce the K means and the steps involved in it and implement these steps in both R & Python.
K - means algorithm
K-means is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem.The procedure follows a simple and easy way to group a given observations set into a certain number of clusters.
K-means algorithm co... read more...
A multidisciplinary fuse of data inference, algorithm development, and technology to solve analytically complex problems is known as Data science.
Data science is eventually about using this data in creative ways to generate business value.
Data Science is divided into two sub plots.
1.Discovery of data insight which helps quantitative data analysis to help steer strategic business decisions
2.Development of Data product, consists of algorithm solutions in production, opening at scale. For... read more...
We are going to compare Functions exist in both R and Python for same operations. And for this we took the Titanic dataset which contains the Passenger details.
Importing a CSV
Reading Data in both the languages is similar, but the only difference is for python we have to import pandas library for reading the Data. Once the importing is done we can look into the data by applying the below functions.
titanic <- read.csv("train.csv")
import pandas as pd
titanic ... read more...
What skills are required to be a Data Scientist?
Is strong mathematics background required to pursue a career as a data scientist?
We at Rang Technologies see a lot of questions like this. It's hard when you're trying to break into the field to know exactly how much math & stats you need.
Primarily, it depends on how a company is defining "data scientist." Some companies say "data scientist" but really mean "data engineer", which is much more focused on the software engineerin... read more...
Now days the retail banking is one of the important business in banking sector, to improve the customer base, retain the existing customer, improve the banking revenue by offering different product to customer.
Today's world is digital, data is raising like population and you have to identify different customer in term of market opportunity, their risk and profit for bank.
Of course, statistics play a key role in this situation. Following are the different model approach for the retail banking ... read more...
Forecasting is a common technique used in several companies to make predictions for the future. There are multiple methods of forecasting such as time series forecasting, multivariate forecasting, etc. In each of these methods there are techniques such as Moving Average (MA), ARIMA, ARMA, ARCH, GARCH, etc.
There are several packages and algorithms available for forecasting in every data mining tool such as SAS, SPSS, etc. These packages help in accomplishing small scale forecasting; here 'sm... read more...
In R, you can accomplish the same task in different ways.
This R document explains functions from R package--dplyr and in some places compares those functions with base functions.
# import dplyr library
# we are going to work with R in built dataset airquality
## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 N... read more...
It's been a month and what a month it has been! The new year is here already! One wonders where all the time has gone! I hope you also wonder on how far you have come in the process of learning and equipping yourself towards a successful career and life and take pride in it! This 'present' day's effort will grant you the 'gift' of future!
When we last communicated, we spoke of where to start and how to master different languages such as SAS, R and Python. Now that this crucial decision is made... read more...
Wikipedia describes Analytics as "the discovery and communication of meaningful patterns in data." This comes in handy especially in areas rich with recorded information. Every day companies all over the world collect data about their customers and industries, simply as a routine activity during business transactions.
World-class firms use this collected data and leverage on analytics 5 times more than the others to describe, predict, and improve business performance. This kind of analytics is... read more...
What the hottest topic in today's business world? Data! What does every business need in order to analyze trends and make informed decisions? Data! On what basis are key policies framed that can make or break our modern world? Data again! Sounds interesting?! Imagine being involved in data analysis at some level and enabling businesses make such key decisions... if this is something you would like to do for a living, then Data Analyst is the thing to be!
Typically, data analysis fits the bill... read more...