SigmaWay Blog

SigmaWay Blog tries to aggregate original and third party content for the site users. It caters to articles on Process Improvement, Lean Six Sigma, Analytics, Market Intelligence, Training ,IT Services and industries which SigmaWay caters to

The Art of Predictive Modelling 

Your perspective on data depends on the type of task you want to accomplish. They could be broadly specified as: Analytics : Helps you explore what happened and why.

Monitoring : Looking at things as they occur to find abnormalities.

Prediction : To predict what might happen in future.

Some of the most popular algorithms that can be applied to a predict future trends are :

The Ensemble Model : It uses multiple model output to arrive at a decision , however, one has to understand how to pick correct models and what problem does one want to solve.  

Unsupervised Clustering Algorithms : These algorithms help to group similar people and objects together.

Regression Algorithms:  These are used to predict future values of a product/service

There is no ideal formula to find the best suitable method for predictive analytics. A strong level of business expertise is required to master ‘art’ of predictive modelling. Read more at: http://www.analyticbridge.com/profiles/blogs/the-ultimate-guide-for-choosing-algorithms-for-predictive

 

  2985 Hits

Data Value Chain for GeoSpatial Data

The value of data has changed over time. Companies have realized that collecting, analyzing, sharing, selling data and extracting actionable insights is critical to the development of their organization. Geospatial data is captured and analyzed by engineers and product managers to develop creative solutions and thus increasing productivity. People can view the flow of geospatial data from the instant it is collected throughout its lifecycle using a framework known as 'Data Value Chain'. Data intersects with analytics and can turn this information into decisions. A technological ecosystem built around a geospatial system provides new ways to work and reduce costs, accelerate schedules and supply high-value deliverables along the value chain. Read more at : http://dataconomy.com/2017/02/power-of-data-value-chain/

  4162 Hits

Data Science Challenges in Production Environment 

A very little time is spent on thinking about how to deploy a data science model into production. As a result, many companies fail to earn the value that comes from their efforts and investments. In production environment data continuously comes, result are computed and models are frequently trained. The challenges faced by companies fall into four categories:  Small Data Teams: They mostly use small data, often don’t retrain models and business team is involved in a development project. 

Packagers: Often build their framework from scratch and practice informal A/B testing , generally not involved with the business team

Industrialization Maniacs: These teams are IT led and automated process for deployment and maintenance , business team are not involved in monitoring and development

The Big Data Lab : Uses more complex technologies , business teams are involved before and after deployment of data product

Companies should understand that working in production is different than working with SQL databases in development , moreover real time learning and multi-language environments will make your process complex. Also a strong collaboration between business and IT teams will increase your efficiency. Read more at : http://dataconomy.com/2017/02/value-from-data-science-production/

 

  4557 Hits

Rise of Data Science Platforms

Data science platform has become a buzzword of the decade. So, what is it? The sole purpose of a data science platform is to encapsulate all off-data science work by incorporating tools required to visualize, deploy, collect, analyze data, build models, generate reports. This toolkit makes it convenient to maintain, reproduce and scale up the project and produce results dynamically. Adoption of data science platforms is expected to grow almost double by 2018 as more companies realize its potential benefits. Many data driven business faces the challenge of effectively utilizing data science tools and lack integrated approach to their data science technology stack to find value in the data. While on the other hand, companies who have already established data science platforms are excelling in the field.

Read more at : http://dataconomy.com/2017/02/tech-wave-data-science-platforms/

  2995 Hits

Deploying Machine Learning On Real Time Systems

The three critical steps involved in deployment of machine learning algorithm and exposing it to real world are :

Define a goal based on a metric : Decide if you want human level intelligence or an acceptable one as this decision will affect time and engineering cost of your system. Also define a metric to measure performance of your model.

Build the system : Build a minimum viable system without worrying much about accuracy. Then build an incremental strategy to improve your system by solving problems you face in each iteration.

Refine the system with more data : Initial metric values are not the indicators of real life, your data and users might change , so regularly monitor the system performance. Update it with new data and fine tune the model accordingly.

Read more at : http://www.erogol.com/short-guide-deploy-machine-learning/

  3176 Hits

Enhancing Artificial Intelligence using Ensemble Training

Sometimes even the Machine learning algorithms behave so dumb that an image recognition model can be confused by generating an adversarial instance, i.e. by changing few pixels by either taking derivative of model output or exploiting genetic algorithms. Adversarial instances lie in low probability regions which is in contrast with limited instances of high probability regions from which the model was trained. A possible approach to solve this problem is ensemble training - To let multiple models back each other. As we look forward to developing more artificial intelligent systems it would become common to encounter such problems.

You can read more at: http://www.erogol.com/ensembling-against-adversarial-instances/

  2961 Hits

Effective Quality Management using Hypothesis Test

A business hypothesis is a foundational theoretical concept whose good understanding helps you to achieve business goals. For instance, it provides a mathematical way to answer questions like whether you should spend on advertising or whether increasing a price of a product will affect your customers. Data collection is one part of the game, but correct data processing and interpretation is the final stage of your decision-making process. Hypothesis testing is used to infer whether there is enough data to support evidence . There are various test methods : Parametric Tests - z-test, t-test, f-test. Non Parametric Tests - Wilcoxon Rank-sum test, Kruskal-Wallis test and permutation test.

Read more at : http://www.datasciencecentral.com/profiles/blogs/importance-of-hypothesis-testing-in-quality-management

  3538 Hits

Hadoop Architecture for Big Data Analytics

 

The emergence of massive unstructured data sources like Facebook and Twitter has created a need to develop distributed processing systems for Big Data Analytics. Hadoop (A Java based programming framework) has become the first choice of developers and industry experts mainly because its: Highly scalable, flexible, and cheap. An application is broken down into various small parts which runs on thousands of nodes to achieve fast computing speed and reduce overall operation time. Hadoop architecture continues to operate even if a node fails. Its incredible design allows you to process large volumes of data and extract computationally difficult features of users/customers.

Read more at : http://www.datasciencecentral.com/forum/topics/how-to-use-hadoop-for-data-science

  3622 Hits

Good Statistical Practice

You can’t be a good data scientist unless you have a good hold on statistics and have a way around data. Here are some simple tips to be an effective data scientist:
Statistical Methods Should Enable Data to Answer Scientific Questions - Inexperienced data scientists tend to take for granted the link between data and scientific issues and hence often jump directly to a technique based on data structure rather than scientific goal.
Signals Always Come with Noise - Before working on data, it should be analysed and the actual usable data should be extracted from it.
Data Quality Matters - Many novice data scientists ignore this fact and tend to use any kind of data available to them, if always a good practice to set norms for quality of data.
Check Your Assumptions - The assumptions you make tend to affect your output equally as your data and hence you need to take special care while making any assumption as it will affect your whole model as well as results.
These are some of the things to keep in mind when working around with data. To know more you can read the full article by Vincent Granville athttp://www.datasciencecentral.com/profiles/blogs/ten-simple-rules-for-effective-statistical-practice

 

  3129 Hits

Scaling Data Models in Production Environment

Often the outputs of data models developed by data, scientists end up in a report which summarizes the state of business and used by stakeholders to make decisions. But it is necessary to achieve a system that can predict the future outcomes in real time. This can be done by integrating the model in a production environment, however, it requires advance engineering skills and data scientists cannot do it alone. The process of deployment follows broadly 7 steps :  1.Refactor the model code

2. Walk through the code and determine how it slots into the engineering cycle

3.Re-write into a production stack language or PMML

4.Implement it into the tech stack

5. Test performance

6. Tweak the model based on test results

7.Slowly roll out the model.

Today many companies are adopting tools to make this process faster to reap the benefit of data driven decision making.

Read more at : https://www.datascience.com/blog/navigating-the-pitfalls-of-model-deployment

 

  2990 Hits

Recommenders : The Future of E-commerce

Recommender systems have become the backbone of the ecommerce sector. They have helped companies like Amazon and Netflix to increase their revenue to as much as 10% to 25%.
And hence the need of the hour is to optimize their performance.
So, what are recommenders? Recommenders are the applications which personalize your customer’s shopping experience by recommending next best options in light of their recent buying or browsing activity. Recent developments in analytics and machine learning have let to many state of the art recommender systems.
Types of Recommenders: There are broadly five types of recommender systems, which are as follow:
1. Most Popular Item
2. Association and Market Basket Models
3. Content Filtering
4. Collaborative Filtering
5. Hybrid Models

In coming years, recommender system will be used by almost every organisation, whether it's big or small, and will become an inseparable part of the ecommerce world.


To know more read the article by William Vorhies at: http://www.datasciencecentral.com/profiles/blogs/understanding-and-selecting-recommenders-1

 

 

  3320 Hits

2016: The year of Deep Learning

 2016 has been the year of deep learning, some big breakthrough were achieved in 2016 by Google and DeepMind.Some of the most significant achievements are as follow :

 AlphaGo triumphs Go showdown : AlphaGo the google’s AI for the game Go to everyone’s surprise was able to beat Go champion Lee Sedol.

 Bots kicking our butts in StarCraft : DeepMind AI bots were able to outperform some of the top rated StarCraft II players.

 DIY deep learning for Tic Tac Toe : AlphaToe a AI bot was able to outperform most of the people that played with it.

 Google’s Multilingual Neural Machine Translation : Google was able to make a model which is capable of translating text b/w languages, reaching a new milestone in linguistics and NLP.

 Hence , in a nutshell , 2016 was the year for Deep Learning and a lot of unachievable milestone were conquered during the annual year.

 To know more you can read the full article by Precy Kwan at http://www.datasciencecentral.com/profiles/blogs/year-in-review-deep-learning-2016

 

  3570 Hits

A Guide to Choosing Machine Learning Algorithms

Machine Learning is the backbone of today’s insights on customer, products, costs and revenues which learns from the data provided to its algorithms. And hence algorithms are the next most important thing in data science after data.
Hence , the question which algorithm to use ? Some of the most used algorithms and their use cases are as follow :

1) Decision Trees - It’s output is easy to understand and can be used for Investment decision ,Customer churn ,Banks loan defaulters,etc.

2) Logistic Regression - It’s a powerful way of modeling a binomial outcome with one or more explanatory variables and can be used for Predicting the Customer Churn, Credit Scoring & Fraud Detection, Measuring the effectiveness of marketing campaigns, etc. ,

3) Support Vector Machines - It’s a supervised machine learning technique that is widely used in pattern recognition and classification problems and can be used for detecting persons with common diseases such as diabetes, hand-written character recognition, text categorization, etc. ,

4)Random Forest: It’s an ensemble of decision trees and can solve both regression and classification problems with large data sets and used in applications such as Predict patients for high risks, Predict parts failures in manufacturing, Predict loan defaulters, etc.


Hence based on your need and size of your dataset , you can use the algorithm that is best for your application or problem.
You can read the full article by Sandeep Raut at http://www.datasciencecentral.com/profiles/blogs/want-to-know-how-to-choose-machine-learning-algorithm

 

  3639 Hits

Winning Data Strategy using Industrialized Machine Learning

 The first block to build a winning business strategy is to create a map based on business value of the question and approximating how much time would it take to get high quality answers to that question. The idea is to break the business questions into groups that corresponds to real time data systems. It allows you to focus on a specific system at once to build a strong strategy and optimize the sequence in which each sub question needs to be answered depending upon its current business value. A pattern of actions for data strategy begins with a hypothesis and collection of relevant data followed by building models to explain the data and evaluating its credibility for future predictions. The entire process is achieved on an enterprise scale digital infrastructure using Industrialized Machine Learning (IML). This approach can have a huge impact on natural resources and healthcare industries as well.

Read more at : https://blogs.csc.com/2016/07/05/how-to-build-and-execute-a-real-data-strategy/

 

  4295 Hits

A Neural Network Approach To Raise Your E-Book Business 

E-Book business communities generate a lot of revenue everyday but sometimes it is difficult for author(s) to earn decent amount because of lack of preparation and research. No matter how unique and interesting your content is, if it doesn't appear on the first or second page of search results, it's highly unlikely that a visitor would ever read it. The story doesn't end here, one must cleverly select the title and cover which attract the reader as it changes the way we think. A neural network approach for the determination of most titles using Doc2Vec can be adopted to increase revenue. It involves training a thin two-layer neural network, which operates in unsupervised mode and form clusters of most similar words (using cosine similarity metric) based on context.

To read more about the technical implications here: http://www.datasciencecentral.com/profiles/blogs/use-neural-networks-to-find-the-best-words-to-title-your-ebook

  2954 Hits

Automatic Debt Management System 

Big Data Analytics and Business Intelligence is changing the way business interacts with customers. Modern big data solutions have enabled automated decision making in debt management systems for client handling processes. Correct implementation of these tools provides a more personalized experience to each customer and avoid infringements. Debt management automation has been proven a successful solution to maintain balance between meticulous efficiency and customer satisfaction. Such a CRM automates a lot of process and thus it requires a small team days to complete debt collection process. Analytics have not just accelerated debt collection, but also enhanced customer relations.

You can read more at: http://www.dataminingblog.com/what-could-big-data-mean-for-debt-management/

 

 

  3509 Hits

Essence of Qualitative Research

Global markets are becoming more complex each day, and therefore, it has become essential for business intelligence teams to apply advanced methods for data interpretation. They believe that only the decisions based on quantitative data can be justified. Although there are some ways quantitative research may go wrong, the truth comes out only when you meet people, talk to them, involve them in creative exercises.

Read more at: http://www.dataversity.net/science-big-data-art-interpretation/

  4331 Hits

Importance of Data Preparation

Data is the backbone of analytics and machine learning and hence one of the most important tasks in analytics is to get the right kind of data and in the required format.The importance of data can be understood by the fact that around 60 to 80 percent of the time of an analyst is spent in preparing the data.
What exactly is data preparation? In a nutshell, it is the process  of collecting, cleaning, processing and consolidating the data for use in analysis. It enriches the data, transforms it and improves the accuracy of the outcome.
How is it done? It is mostly done through analytics or traditional extract, transform and load (ETL) tools. ETL tools include self-service data preparation tools, data cleansing and manipulation tools, etc.
Since data is the foundation of the analytics, right data will helps in analysing the situation better and help organizations in reacting positively to the market shifts.
To know more read the full article by Ashish Sukhadeve (business analytics professional) at: http://www.datasciencecentral.com/profiles/blogs/why-data-preparation-should-not-be-overlooked

 

  3349 Hits

Big Data Integration for Advanced Analytics 

Modern needs of Big data consumption require data integration before data actually hit the business intelligence tools. This includes leveraging complex and unstructured data and enables raw data to flow securely through business. Today, even the smallest companies produce huge amount of data across systems which need to communicate with each other and therefore requires a platform to pipe all these data sources into Data Lakes.

Read more at: http://www.dataversity.net/dont-put-cart-horse-comes-big-data/

  3861 Hits

Building Consumer Intelligence System

It has been evident that a great customer experience is one of the signs of a healthy business model. Machine Learning and Data Analytics are playing a fundamental role in building consumer intelligence systems. It is important to capture data and there is no single magic source to collect data. Telecoms are making billions by selling data. You need to ensure that the data is relevant to business. Once you have the right data, you are ready to model, design and engineer and deploy your 360-degree customer view platform and achieve the enhance customer experience for your organization.

You can read more at: http://www.datasciencecentral.com/m/blogpost?id=6448529%3ABlogPost%3A508502

 

  4759 Hits