Latest News

Latest

Data Warehouse Concepts

OBIEE Errors

What's New

OBIEE Performance Tips

Sponsor

Big Data

Natural Language Processing

Machine Learning

Latest News

Wednesday, July 6, 2022

 
Credit: Datascience Foundation

Over fitting: 

Over fitting is one of the most practical difficulty for decision tree models. This problem gets solved by setting constraints on model parameters and pruning (discussed in detailed below).
Credit: ROUCHI.AI

Not fit for continuous variables:

While working with continuous numerical variables, decision tree looses information when it categorizes variables in different categories.

Cannot extrapolate:  

Decision tree can’t extrapolate beyond the values of the training data. For example, for regression problems, the predictions generated by a decision tree are the average value of all the training data in a particular leaf.

Credit: Google

Decision trees can be unstable:

Small variations in the data might result in a completely different tree being generated. This is called variance, which needs to be lowered by methods like bagging and boosting.

No Guarantee to return the globally optimal decision tree. 

This can be mitigated by training multiple trees, where the features and samples are randomly sampled with replacement

💕

credit:Medium

 What is Hyper Parameter tuning?

Hyperparameter tuning is searching the hyperparameter space for a set of values that will optimize your model architecture

How to Determine HyperParameters?

Hyperparameter tuning is  tricky as there is no direct way to calculate how a change in the hyperparameter value will reduce the loss of your model, so we usually resort to experimentation

Step:1

Define range of possible values for all the hyperparameters.

To Determine range first  understand what these hyperparameters mean and how changing a hyperparameter will affect your model architecture, thereby try to understand how your model performance might change.

Step: 2

Apply GridSearch(common and expensive) ,or smarter and less expensive methods like Random Search and Bayesian Optimization to determine the Parameters.

Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. A tree can be seen as a piecewise constant approximation. 

Why Non-Parametric?

Nonparametric machine learning algorithms are those which do not make specific assumptions about the type of the mapping function. They are prepared to choose any functional form from the training data, by not making assumptions.

Decision trees are based on a hierarchical structure, and they can be used to solve a wide variety of problems, including regression and classification tasks.

Hyper-parameters

Max_depth : It indicates how deep or downward the decision tree can be

The deeper the tree, more complex it would be and it would capture more information about the data as it will contain more splits

Limitation: Chances of overfitting in the case of more depth as it would be predicting well for the train data but it may fail for the generalise findings in the new data. Training error may be less but the testing error would be high in this case.

Min_samples_split: the minimum number of samples required to split an internal node

We can either denote a number or a fraction to denote a certain percentage of samples in an internal node

Min_samples_leaf: Minimum number of samples required to be at a leaf node (last node which can’t be split further)

Max_features: it considers the number of features to consider when looking for a best split

We can either specify a number to denote the max number of features or fraction to denote a percentage of features to consider while making a split


Tuesday, July 5, 2022

I assume that whoever is reading this article has either heard of Streamlit or has actually created an application utilizing it.

For your reference, Streamlit is among the simplest libraries I've ever used to create interactive dashboards and GUI applications. You don't need any prior HTML, Javascript, or CSS experience.

Cool, huh.

However, I want to concentrate on how to deploy the Streamlit app on hugging faces in this article. In addition to Hugging Face, the Streamlit app can be set up on Microsoft Azure, AWS, GCP, Heroku, and the Streamlit Cloud.

But I discovered that hugging your face is the simplest.

Steps:

1.Create one account in Hugging face.


2. Click on Create new space. you will get navigated to below screen


3.Provide space name . Choose license as 'Other' ,if you don't have any license. Select the Space SDK as 'streamlit'. Select 'Public'  . Then click on 'Create Space'.


4. Below screen will be displayed.


5.If you want to push your code directly from git you can follow 3 steps as mentioned on the page.

But if you dont have repository in git,Then simply click on 'Create' link.


6.Paste your code and click commit new line.


7. Then you can see on the top 'Building' is coming.


8.When the application 'Building' will be completed ,status will get changed to 'Running'.


9. Click on 'app' tab,you will be able to see your app .


10. Also you can create your requiremnets.txt and other files by clicking on Files an dVersion.


💕

Monday, May 30, 2022

 Map():Map(function,iterable)

Map function takes a function and an iterable as arguments and applies on each element of vector separately.

The returned value from map object can be passed to functions like list(),set() etc to get values from map object.


Filter():The filter function operates on a list and returns a subset of that list after applying filtering rule.

reduce():The reduce function will transform a given list into a single value by applying a given function continuously to all the elements.


You have to import reduce () from functools ,otherwise below error will be thrown saying 'reduce' is not defined.





Saturday, December 18, 2021

 Power Transform: 

Power transformation is used to map data from any distribution to close to Gaussian distribution, as normality of the features is necessary for many modeling scenarios. Also transformation of data is needed in order to stabilize variance and minimize skewness.

For instance, some algorithms perform better or converge faster when features are close to normally distributed.

·         linear and logistic regression

·         nearest neighbors

·         neural networks

·         support vector machines with radial bias kernel functions

·         principal components analysis

·         linear discriminant analysis

Power transformer class canbe accessed from sklearn.preprocessing package. Power Transformer provides two transformations Yeo-Johnson transform and Box-Cox transform.

The Yeo-Johnson transform is: 


The Box-Cox transform is:


Important Points:

Box-Cox can only be applied to positive data only.

Also both transformation is parameterized by  λͅ, which is determined through maximum likelihood estimation.


References:

https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing-scaler

Friday, July 30, 2021

1.Login to Oracle fusion application

2. Go to Navigator and under configuration choose Sandboxes

3.Click on 'Create sandbox ' and provide the name. Make sure there is no space in the given sandbox name.



4.Select Appearance Tool

5.Click on create and enter



6.A yellow pop-up bar will be visible. Click on Tools and select appearance



7. You can there is 'Logo' .select File and upload your logo image.



8.Click on actions and save as . 



9. Your updated logo will be displayed 



10. Click on Apply button.

11.after that click on your sandbox name "LogoChange" on the left-hand side and click on publish. 

12.Then it will navigate to the sandboxes page. Click on publish and done.

13. Your logo will be changed.

Ads Place 970 X 90

Big Data Concepts

Error and Resolutions

Differences