Saturday, August 3, 2019

Georgia Tech - Boot Camp --PreWork -- Day 1 of Study

have started reading https://coding-bootcamp-dataviz-prework.readthedocs-hosted.com/en/latest/modules/module-2-machine-ready/.

Downloaded many software mentioned on link..

------Links for git----------
Installing Windows Bash
https://itsfoss.com/install-bash-on-windows/

Windows Subsystem for Linux has no installed distributions.
Distributions can be installed by visiting the Windows Store:
https://aka.ms/wslstore

Windows store -- Ubuntu Bash

Course Work
https://bootcampspot.com/coursework/48638/show

Install your tools
https://coding-bootcamp-dataviz-prework.readthedocs-hosted.com/en/latest/modules/module-2-machine-ready/

setting up git username
https://help.github.com/en/articles/setting-your-username-in-git

Git download
https://git-scm.com/download/win

Directory
C:\Program Files\Git
less ~/.gitconfig

https://github.com/settings/tokens

Control Panel\All Control Panel Items\Credential Manager

Setting up commit email
https://help.github.com/en/articles/setting-your-commit-email-address


C:\Users\magogate

https://github.com/settings/tokens
Control Panel\All Control Panel Items\Credential Manager
------Link for git ends---

So.. first thing I am writing about is of Git setup.. following 2 links which I got from my module - 0 instructions are
1.https://help.github.com/en/articles/setting-your-username-in-git
           Here, I have given user as mandargogate [though mgogate was a good choice, but it doesn't give my full name]. I just followed the instructions given on the site. Type Git Bash in run
2.https://help.github.com/en/articles/setting-your-commit-email-address
           Here, I logged in to https://github.com/login with my username - mandargogate@outlook.com and password as :)

Changed my git commit email to mandargogate@outlook.com

manda@DESKTOP-28T0UPP MINGW64 ~
$ git config --global user.email "mandargogate@outlook.com"
manda@DESKTOP-28T0UPP MINGW64 ~
$ git config --global user.email
mandargogate@outlook.com

Main installation instructions are at following link
https://coding-bootcamp-dataviz-prework.readthedocs-hosted.com/en/latest/modules/get-yo-tools-installed-on-windows/

Creating SSH key
manda@DESKTOP-28T0UPP MINGW64 ~
$ ssh-keygen -t rsa -b 4096 -C "mandargogate@outlook.com"
Generating public/private rsa key pair.
Enter file in which to save the key (/c/Users/manda/.ssh/id_rsa):
Created directory '/c/Users/manda/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /c/Users/manda/.ssh/id_rsa.
Your public key has been saved in /c/Users/manda/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:dTCi3O/2b0hvmqQGXf9U/mnebCEMdY5YHmzaIrqPpJ4 mandargogate@outlook.com
The key's randomart image is:
+---[RSA 4096]----+
|        . o .    |
|     . o . o * . |
|      o . . X =  |
|         + *.+ ..|
|        S.o.+. ..|
|       .... .o..o|
|       ...o..o.o+|
|      +....+..++=|
|    .E ..o. +=+oo|
+----[SHA256]-----+

manda@DESKTOP-28T0UPP MINGW64 ~
$ eval "$(ssh-agent -s)"
Agent pid 1435
manda@DESKTOP-28T0UPP MINGW64 ~
$ ssh-add ~/.ssh/id_rsa
Identity added: /c/Users/manda/.ssh/id_rsa (mandargogate@outlook.com)
manda@DESKTOP-28T0UPP MINGW64 ~
$ clip < ~/.ssh/id_rsa.pub
manda@DESKTOP-28T0UPP MINGW64 ~
$ ssh -T git@github.com
The authenticity of host 'github.com (192.30.253.113)' can't be established.
RSA key fingerprint is SHA256:nThbg6kXUpJWGl7E1IGOCspRomTxdCARLviKw6E5SY8.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'github.com,192.30.253.113' (RSA) to the list of known hosts.
Hi magogate! You've successfully authenticated, but GitHub does not provide shell access.
manda@DESKTOP-28T0UPP MINGW64 ~
$


At last created account for https://dashboard.heroku.com/apps associated with mandargogate@gmail.com
Opened a command prompt and
C:\Users\manda>heroku login
heroku: Press any key to open up the browser to login or q to exit:
Opening browser to https://cli-auth.heroku.com/auth/browser/ef909dc2-da03-4637-a57a-c7bcca7f0985
Logging in... done
Logged in as mandargogate@gmail.com

Today on 08/19/2019 I accepted an invitation for Git Lab of Georgia Tech using below link. Created new user as mandargogate ; also gave email as mandargogate@outlook.com

https://gt.bootcampcontent.com/dashboard/projects

After logging in, it is showing me one entry of project details


 Day 2 of Study
-- whatever starts with stu is for student exercise

-- In excel you can give a range as an input to a function. If there are different columns, just select entire range after giving formulae in excel.

how to create a diagram in excel with standard deviation

Name Range -- under furmulaes -- name manager
select column --- right click -- define name
$Cell$ -- not increment when you drag below

Choose(randbetween) -- Conditional Formatting
ifs
switch
=IF(AND(D2>5, D2<10),TRUE,FALSE)
=text(date,"mmm-yy") to get the month name out of date


----------Python Session on 5 Sep 2019 --

Visual Studio Code
Ctl + Shift + P -- to get the command
Ctl + / -- to comment

------------Python session on 9 Sep 2019
-- Somehow commond is not working but it works when we run it manually
at commond propmt
jupyter Notebook

at commond Prompt
jupyter Lab

Jupter Lab
C:\Users\manda\Anaconda3\Lib\site-packages\jupyterlab

Go to anaconda nevigator
Click on Jupter Lab and then
another window will open
click on python
another window will get open

To run command either execute or hit Shift+Enter
Shit Tab will open function difination

in below case,
first arg is starting point
second arg is endpoing point
last arg is increment by val
list(range(1,10,5))

-- it will traverse the list in reverse order
list(range(30,1,-3))

In Jupyter Lab if at right hand side circle is solid black that means your kernal is busy, may be you have gone through an infinite loop. In order to kill that, go to Kernal menu in same window, select interrupt kernal, or restart kernal

pwd works in jupyter notebook

SQL Vs MongoDB
Database/Database
Table/Collection
Row/Document
Column/Field

To open a file in VS Code from command prompt
code <file name>

to open a file in explorer from command prompt
explorer

conda install pymongo
or
pip install pymongo

On Anaconda prompt it got installed quickly. not sure why it won't work on normal prompt/

After installation on Anaconda Prompt, code did not work as it could not identify pymongo module.

I terminated installation on normal prompt and at same prompt I did pip install

Once pip installation got finished, code as VS Code was able to find pymongo module.

 Directory of D:\MyWork\GeorgiaTech\ClassesWork\11-Nov

11/05/2019  08:47 PM    <DIR>          .
11/05/2019  08:47 PM    <DIR>          ..
               0 File(s)              0 bytes
               2 Dir(s)  216,569,704,448 bytes free

D:\MyWork\GeorgiaTech\ClassesWork\11-Nov>code mongodbApp.py

D:\MyWork\GeorgiaTech\ClassesWork\11-Nov>conda install pymongo
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

CondaError: KeyboardInterrupt

Terminate batch job (Y/N)? Y

D:\MyWork\GeorgiaTech\ClassesWork\11-Nov>pip install pymongo
Collecting pymongo
  Downloading https://files.pythonhosted.org/packages/c9/36/715c4ccace03a20cf7e8f15a670f651615744987af62fad8b48bea8f65f9/pymongo-3.9.0-cp37-cp37m-win_amd64.whl (351kB)
     |████████████████████████████████| 358kB 726kB/s
Installing collected packages: pymongo
Successfully installed pymongo-3.9.0
WARNING: You are using pip version 19.2.3, however version 19.3.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

To open mango db
use Mongo DB Compass

--------11/07----
How to check if mongo db service is working?
open a command prompt and type mongod..and if it displays any data/prompts .. mongo db service must be running

https://stackoverflow.com/questions/35309042/python-how-to-set-global-variables-in-flask


C:\Users\manda>pip install requests
Collecting requests
  Downloading https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl (57kB)
     |████████████████████████████████| 61kB 563kB/s
Collecting certifi>=2017.4.17 (from requests)
  Downloading https://files.pythonhosted.org/packages/18/b0/8146a4f8dd402f60744fa380bc73ca47303cccf8b9190fd16a827281eac2/certifi-2019.9.11-py2.py3-none-any.whl (154kB)
     |████████████████████████████████| 163kB 1.1MB/s
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 (from requests)
  Downloading https://files.pythonhosted.org/packages/e0/da/55f51ea951e1b7c63a579c09dd7db825bb730ec1fe9c0180fc77bfb31448/urllib3-1.25.6-py2.py3-none-any.whl (125kB)
     |████████████████████████████████| 133kB 1.1MB/s
Collecting chardet<3.1.0,>=3.0.2 (from requests)
  Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)
     |████████████████████████████████| 143kB 595kB/s
Collecting idna<2.9,>=2.5 (from requests)
  Downloading https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl (58kB)
     |████████████████████████████████| 61kB 990kB/s
Installing collected packages: certifi, urllib3, chardet, idna, requests
  WARNING: The script chardetect.exe is installed in 'C:\Users\manda\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\Scripts' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed certifi-2019.9.11 chardet-3.0.4 idna-2.8 requests-2.22.0 urllib3-1.25.6
WARNING: You are using pip version 19.2.3, however version 19.3.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

---------------here bs4 stands for beautiful soup version 4
C:\Users\manda>pip install bs4
Collecting bs4
  Downloading https://files.pythonhosted.org/packages/10/ed/7e8b97591f6f456174139ec089c769f89a94a1a4025fe967691de971f314/bs4-0.0.1.tar.gz
Collecting beautifulsoup4 (from bs4)
  Downloading https://files.pythonhosted.org/packages/3b/c8/a55eb6ea11cd7e5ac4bacdf92bac4693b90d3ba79268be16527555e186f0/beautifulsoup4-4.8.1-py3-none-any.whl (101kB)
     |████████████████████████████████| 102kB 930kB/s
Collecting soupsieve>=1.2 (from beautifulsoup4->bs4)
  Downloading https://files.pythonhosted.org/packages/81/94/03c0f04471fc245d08d0a99f7946ac228ca98da4fa75796c507f61e688c2/soupsieve-1.9.5-py2.py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4, bs4
  Running setup.py install for bs4 ... done
Successfully installed beautifulsoup4-4.8.1 bs4-0.0.1 soupsieve-1.9.5
WARNING: You are using pip version 19.2.3, however version 19.3.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

C:\Users\manda>pip install beautifulsoup4
Requirement already satisfied: beautifulsoup4 in c:\users\manda\appdata\local\packages\pythonsoftwarefoundation.python.3.7_qbz5n2kfra8p0\localcache\local-packages\python37\site-packages (4.8.1)
Requirement already satisfied: soupsieve>=1.2 in c:\users\manda\appdata\local\packages\pythonsoftwarefoundation.python.3.7_qbz5n2kfra8p0\localcache\local-packages\python37\site-packages (from beautifulsoup4) (1.9.5)
WARNING: You are using pip version 19.2.3, however version 19.3.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

------ 11/12/2019 ------
if u want to install commands from jupyter notebook, u just have to specify "!" at the beginning. as you can see below, for splinter u just have to specify !

!pip install splinter

selenium is used in automating the testing or web scrapping by kind of clicking on the next page.

Following is a ETL project done by Fayyaz, especially visualization they have used was really good.
https://github.com/faradical/ETL-Project

------11/23-----
if u want to open entire content of a folder in visual studio code, then go to explorer, right click on respective folder and click on Open With Code option.

if you want to open file in a tab side by side, click on respective file at left panel and there will be an option open to a slide its something similar to notepadd++, or even you can slide the file to side which will open it side by side.

---12/03----

wy3qygpXa725G5LQVSxt

This site gives us a data.. which we can fetch based on above API key.

-- 12/05 --
To start webserver on python
python.exe -m http.server

PS D:\MyWork\GeorgiaTech\ClassesWork\5_Dec\Activities\Activities> python.exe -m http.server
Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...

somehow ip address shows as 0.0.0.0 but that if u change it to localhost \ 127.0.0.0
it looks for a file index.html.. if that file is available then it will automatically shows up-- this is something similar to web.xml wherein u do the entry for index.html or flusk where u mentioned file at / which is a root path.

Git-Hub
Go to Settings after cretaing repo
Got to GitHub Pages
select master branch in first drop down
after doing that... just go to io location -- that is main directory for git io--https://magogate.github.io
and then go to particular repository .. as shown below week15 is a new repo we created which acts as a folder after main path of io.
https://magogate.github.io/week15/

Now.. here
-- 12/07 ---
Declarative vs Imperative programming

use each function to access each DOM element .. it accepts 2 arguments, first is data and next is index. and this is default argument.
d3.selectAll("ul").selectAll("li").each(function(d,i){
console.log("element", this);
        console.log("data", d);
        console.log("index", i);
})

Setter method to bind the data
d3.select("ul").selectAll("li").data(grades)

getter method to get the data
d3.select("ul").selectAll("li").data()




























Now, if we use text (and in anonymous function returns the value of d ) after binding the data element, text of my dom element will get change as per data binding. Please see below
d3.select("ul").selectAll("li").data(grades).text(function(d){
return d;
})
this is after binding and returning new text
And the original text was as below















Now, if data in array is same as no. of DOM elements, then it will return / change respective text. However, if we have more data elements than a dom e.g. in below case, I have added one more data element i.e. size of grades is 4 now, and dom element in HTML tag we have defined are 3 currently. in this case data will change only first 3 dom elements.
Now, if u want to bind extra data element still, in that case u will have to call enter() & append() function. enter() will actually I think again assign the values and append() will create new DOM element based on orphan data element and assign it to DOM by creating a new one
As shown in below image, new DOM element got created with value 40 but old ones which were already exists been as is.















Now, if your data elements are less than ur dom elements, and you need to keep ur dom elements in sync with ur data elements, in that case u will have to call remove.
Basically entire objective of all this is to run your code based on data.. data driven ... so u r trying to sync / change dom elements based on what data u have..

if u have more data, add DOM -- and if u have less data then remove DOM..


















now we saw how can we change or add a text to dom element, but now if you want to append a html element, you can use .html function and pass html tags as an arguments
as shown below, we first bind our data elements to DOM and put that in table_body. we basically selected all "tr" elements under "tbody" dom and bind the data against it.

Use following link to understand more of it
https://medium.com/@c_behrens/enter-update-exit-6cafc6014c36
and
https://medium.com/@bryony_17728/d3-js-merge-in-depth-a3069749a84f
and
https://bost.ocks.org/mike/join/

let table_body = d3.select(".table-striped")
                    .selectAll("tbody")
                    .selectAll("tr")
                    .data(austinWeather)
                    .enter()
var austinWeather = [{
  date: "2018-02-01",
  low: 51,
  high: 76
},
{
  date: "2018-02-02",
  low: 47,
  high: 59
},
{
  date: "2018-02-03",
  low: 44,
  high: 59
},
{
  date: "2018-02-04",
  low: 52,
  high: 73
},
{
  date: "2018-02-05",
  low: 47,
  high: 71
}
];
// YOUR CODE HERE
let table_body = d3.select(".table-striped")
                    .selectAll("tbody")
                    .selectAll("tr")
                    .data(austinWeather)
                    .enter()

table_body.append("tr")
          .html(function(d){
            return `<td>${d.date}</td><td>${d.low}</td><td>${d.high}</td>`
          })
-----12/17-----
Leaflet is just an interface or JS library, which uses mapbox map database.
So to access leaflet, you do not need token or key but for mapbox u still need
key. This key of mapbox u will be using in JS script of Leaflet.
---12/19---
When u create a token at mapbox, make sure u will select all options and then use it
this is needed in many cases otherwise it usually gives an error

----------------------1/23----------------------
Machine Learning
Scikit Learn Library - https://scikit-learn.org/stable/
Intelligent Algorithms
UnSupervised
Clustering -- finding groups within population
Dim Reductio
Supervised
Classification
Regression
Re-Inforcement

Model.Coef == how steep is our curve
Machine Learning -- Day 1

-------01/24----
On 01/23.. instructor asked us to install following 2 things... did this installation on both anaconda prompt and normal command prompt

(base) C:\Users\manda>pip install graphviz Collecting graphviz Downloading https://files.pythonhosted.org/packages/f5/74/dbed754c0abd63768d3a7a7b472da35b08ac442cf87d73d5850a6f32391e/graphviz-0.13.2-py2.py3-none-any.whl Installing collected packages: graphviz Successfully installed graphviz-0.13.2 (base) C:\Users\manda>pip install pydotplus Collecting pydotplus Downloading https://files.pythonhosted.org/packages/60/bf/62567830b700d9f6930e9ab6831d6ba256f7b0b730acb37278b0ccdffacf/pydotplus-2.0.2.tar.gz (278kB) |████████████████████████████████| 286kB 819kB/s Requirement already satisfied: pyparsing>=2.0.1 in c:\users\manda\anaconda3\lib\site-packages (from pydotplus) (2.4.0) Building wheels for collected packages: pydotplus Building wheel for pydotplus (setup.py) ... done Stored in directory: C:\Users\manda\AppData\Local\pip\Cache\wheels\35\7b\ab\66fb7b2ac1f6df87475b09dc48e707b6e0de80a6d8444e3628 Successfully built pydotplus Installing collected packages: pydotplus Successfully installed pydotplus-2.0.2
Sat webcast --
https://codingbootcamp.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=56e4d5d8-598a-4f3b-85dd-ab4d00f6bec8

------Logistic Regression-------

I got error as InvocationException: GraphViz's executables not found while running following code
import graphviz dot_data = tree.export_graphviz( clf, out_file=None, feature_names=iris.feature_names, class_names=iris.target_names, filled=True, rounded=True, special_characters=True) import pydotplus graph = pydotplus.graph_from_dot_data(dot_data) graph.write_png('iris.png') graph = graphviz.Source(dot_data) graph

https://datascience.stackexchange.com/questions/37428/graphviz-not-working-when-imported-inside-pydotplus-graphvizs-executables-not

After I set the System environment variable as well... by specifying following exe file path on my machine..
C:\Users\manda\Anaconda3\Library\bin\graphviz

Closed all sessions of Jupyter Notebook and Even Anaconda Prompts
and re-started Jupyter Notebook.. and finally it worked..

What are type1 & type2 errors and how they impact in choosing a model?

https://www.lpsm.paris/pageperso/has/source/Hand-on-ML.pdf

----------------Neural Networks---------------
https://codingbootcamp.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=a1a5bb35-7443-4695-8515-ab5001808529

http://playground.tensorflow.org/

Tensor Flow with JavaScript
https://www.tensorflow.org/js/

https://www.manning.com/books/deep-learning-with-python

https://machinelearningmastery.com/loss-and-loss-functions-for-training-deep-learning-neural-networks/

Following is last code in first ins activity which needs to be changed

from tensorflow.keras.utils import to_categorical [this line needs to be changed.. its working.]

# Step 2: One-hot encoding
one_hot_y = to_categorical(encoded_y)
one_hot_y

-----------------Big Data-- Day1-------------
installed "pip install mrjob" - here mr stands for map reduce

conda install mrjob failed... so tried pip install mrjob

https://codingbootcamp.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=5727ce41-3070-4e79-ae66-ab5201830ad6
C:\Users\mgogate>pip install mrjob
Collecting mrjob
  Downloading https://files.pythonhosted.org/packages/e6/25/5db4339980022a30a4b354f15a4ac74b8bfe2456f756274a5048ac40f117/mrjob-0.7.1-py2.py3-none-any.whl (434kB)
     |████████████████████████████████| 440kB 1.3MB/s
Requirement already satisfied: PyYAML>=3.10 in c:\users\mgogate\appdata\local\continuum\anaconda3\lib\site-packages (from mrjob) (5.1.1)
Installing collected packages: mrjob
Successfully installed mrjob-0.7.1

C:\Users\mgogate>


https://aws.amazon.com/s3/ -- for today csv files are hosted here

Following way you can run the program
1st example----
PS C:\Mandar\GeorgiaTech\BigData> python.exe .\bacon_counter.py .\input.txt
No configs found; falling back on auto-configuration
No configs specified for inline runner
Creating temp directory C:\Users\mgogate\AppData\Local\Temp\bacon_counter.mgogate.20200131.003741.106758
Running step 1 of 1...
job output is in C:\Users\mgogate\AppData\Local\Temp\bacon_counter.mgogate.20200131.003741.106758\output
Streaming final output from C:\Users\mgogate\AppData\Local\Temp\bacon_counter.mgogate.20200131.003741.106758\output...
"bacon" 21
Removing temp directory C:\Users\mgogate\AppData\Local\Temp\bacon_counter.mgogate.20200131.003741.106758...

2nd example----
PS C:\Mandar\GeorgiaTech\BigData> python.exe .\hot.py .\austin_weather_2017.csv
No configs found; falling back on auto-configuration
No configs specified for inline runner
Creating temp directory C:\Users\mgogate\AppData\Local\Temp\hot.mgogate.20200131.004644.152265
Running step 1 of 1...
job output is in C:\Users\mgogate\AppData\Local\Temp\hot.mgogate.20200131.004644.152265\output
Streaming final output from C:\Users\mgogate\AppData\Local\Temp\hot.mgogate.20200131.004644.152265\output...
"AUSTIN 6 S"    6
"AUSTIN BERGSTROM INTERNATIONAL AIRPORT"        23
"AUSTIN CAMP MABRY"     42
"AUSTIN GREAT HILLS"    7
Removing temp directory C:\Users\mgogate\AppData\Local\Temp\hot.mgogate.20200131.004644.152265...
PS C:\Mandar\GeorgiaTech\BigData>


---------------------------------
Create a new account here..
https://www.zepl.com/register


After login-- go to resources
 Then click on interpreter settings
  Then click on spark [...] and set it as default
    then again click on ... and change default python3 to pyton


Click on Spaces
      Either use existing space
      or create a new space
      Click on Space
      Click on Import
      Upload a json file .. and following log will appear

%pyspark
# Read in data from S3 Buckets
from pyspark import SparkFiles
url = "https://s3.amazonaws.com/dataviz-curriculum/day_1/food.csv"
spark.sparkContext.addFile(url)
df = spark.read.csv(SparkFiles.get("food.csv"), sep=",", header=True)

# Show DataFrame
df.show()

https://s3.amazonaws.com/dataviz-curriculum/day_1/food.csv -- here is nothing but a csv file

Once you done with that, click on Run

So, input json file is nothing but the notebook or .ipynb file.. in json format it reflects like that.


---------------- 22.2 - Big Data in the Cloud-----------------
https://codingbootcamp.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=8a72e833-82d0-4ee5-a7ff-ab5400f73395

We can not call Python function from py spark data frame, and that's why we have to use user defined function on top of normal python function.
-- below is python function
%pyspark
# Create a Function to count vowels
def vowel_counter(words):
    vowel_count = 0

    for word in words:
        for vowel in word:
            if vowel in ('a', 'e', 'i', 'o', 'u'):
                vowel_count += 1

    return vowel_count

--created user defined function with type integer
%pyspark
# Store a user defined function
count_vowels = udf(vowel_counter, IntegerType())
count_vowels

-- called user defined function from pandas dataframe
%pyspark
# Create new DataFrame with the udf
tokenized.select("Poem", "words")\
    .withColumn("vowels", count_vowels(col("words"))).show(truncate=False)

------For ML

*****************FEATURE ENGINEERING***************
https://github.com/yanshengjia/ml-road/blob/master/resources/Feature%20Engineering%20for%20Machine%20Learning.pdf

Feature engineering is needed, since if there are many columns we do no know which one to use for our Model.. now to understand which columns needs to be used, there is a technique and its called feature engineering. Above book is related to that.


%pyspark
# Run the hashing term frequency
hashing = HashingTF(inputCol="tokens", outputCol="hashedValues", numFeatures=pow(2,4)
# Transform into a DF
hashed_df = hashing.transform(wordsData

Here pow(2, 4) --- which is 16.. so hash value 16 will be used..

%pyspark
# Display new DataFrame
hashed_df.show(truncate=False)

+---+---------------------------------+-----------------------------------------+----------------------------------------------+ |id |words |tokens |hashedValues | +---+---------------------------------+-----------------------------------------+----------------------------------------------+ |0 |The cow cow jumped and jumped cow|[the, cow, cow, jumped, and, jumped, cow]|(16,[11,13,14,15],[2.0,1.0,1.0,3.0]) | |1 |then the cow said |[then, the, cow, said] |(16,[0,13,14,15],[1.0,1.0,1.0,1.0]) | |2 |I am a cow that jumped |[i, am, a, cow, that, jumped] |(16,[0,1,2,5,11,15],[1.0,1.0,1.0,1.0,1.0,1.0])| +---+---------------------------------+-----------------------------------------+----------------------------------------------+
HashedValues 16 which you are seeing above is nothing but pow(2,4) which is defined earlier.

The brown, brown, brown fox jumpled over the log next to the stream

#total documents brown appears

if there are 14 documents with sma line then
TF - Term Frequence = 3/14 = .21

Total Documents e.g. 100
#works appeared in each document = 1

IDF - Inverse Document Frequence = 2.00
TF* IDF = 0.21 * 2 == .43

part2 live stream
https://codingbootcamp.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=70d352d6-3460-4d5c-812e-ab54011da6d2 


---------------------------------- Big Data ETL--------------
https://codingbootcamp.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=6987ef49-3c8c-4006-b5de-ab5701843642

df.to_sql function to connect to database..

Amazon S3 -- Simple Storage Service / File Storage

Big Data - Day 3 - Live Stream part 2

https://codingbootcamp.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=d671e875-0dbd-4ed3-921e-ab5800167218


SageMaker is a Jupyter Notebook - hosted on amazon
u can deploy the model here.. and use that as an api

------------------------------------------------Project 3 Day 1-------------------------------
https://codingbootcamp.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=ef3f43a4-9b17-4489-ba9e-ab59018397ec

https://colab.research.google.com/notebooks/intro.ipynb#recent=true

Google Provided Notebook..

--------------------------------Project Day 2 ------------------------------
https://codingbootcamp.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=21f2bff6-9d57-4bb4-8b80-ab5b00f81796



My Presentation
https://codingbootcamp.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=e604e382-30d6-4a40-97fd-ab42017fc00f

---------------------
https://gtagency.github.io/lectures
ML club on campus, meetings and project work every week, meet some more students working in ML

https://www.streamlit.io/


--------------------------
Important Links
Bootcamp Content -- this has all videos and study material
https://gt.bootcampcontent.com/users/sign_in
username -- mandargogate@outlook.com
Master Branch -- https://gt.bootcampcontent.com/GT-Coding-Boot-Camp/GTATL201908DATA3/-/tree/master/
Best of Videos -- https://gt.bootcampcontent.com/GT-Coding-Boot-Camp/GTATL201908DATA3/blob/master/Best-of-Videos.md
Pre-Course Work -- https://coding-bootcamp-dataviz-prework.readthedocs-hosted.com/en/latest/
https://www.bootcampspot.com/ -- Main dashboard for attendance, course work submission etc.
Home Work Links -- https://gt.bootcampcontent.com/GT-Coding-Boot-Camp/GTATL201908DATA3/-/tree/master/01-Excel/Homework/Instructions
Always go inside Instructions and then open ReadMe
Class Work -- go to lessons planned links as below
https://gt.bootcampcontent.com/GT-Coding-Boot-Camp/GTATL201908DATA3/-/blob/master/11-Web/1/LessonPlan.md
Following another example.. its all mentioned here..just check carefully
https://gt.bootcampcontent.com/GT-Coding-Boot-Camp/GTATL201908DATA3/-/blob/master/12-Web-Scraping-and-Document-Databases/2/LessonPlan.md

No comments:

Post a Comment

All about CSS

From book HTML & CSS - Design and Build Websites - Jon Duckett CSS works by associating rules with HTML elements. These rules govern how...