Sunday, November 3, 2019

Data Structures in Python



Data Structures in Python
Python deals with three data structure:

  •          DataFrames.
  •          Series.
  •          Panel.


DataFrame:
DataFrame is somewhere between SQL table and Excel spreadsheet. The whole concept of Pandas is to do what we do in SQL and Excel. It's a library. DataFrame is a two-dimensional array and is represented in the tabular format.

I. To define DataFrame, pandas should be imported.

Below are a few examples:
First, create a dictionary:


1. Convert dictionary into DataFrame:


2. To check the type of frame:


3. To access what is inside the DataFrame: [This is equivalent to select state from data;]

4. To access it in frame:

5. To access it in table format: [This is equivalent to select state from frame;

6. To change the name of the column: Here the indexing is changed from number to words


7. To check the type of column:

8. To check the type of index; 

9. To give value to column:



       Summary for DataFrames:
        ----Two ways to convert into DataFrames----
        1. frame1[['data']]
        2. pd.DataFrame(frame2.data)


II. Filtering DataFrames

     Indexing in Pandas Python is done with the help of loc , iloc and ix.

1. loc (explicit)  : used for indexing or selecting based on name i.e by row name and column name.


2. iloc (implicit) : used for indexing or selecting based on position i.e by row number and column number (index starts with Zero).

3. ix    : indexing can be done by both position and name using ix.


Series:
Series is one-dimensional array capable holding data of any type(integer, string, float, python objects etc). An example of Series is one column of a DataFrame. 

Below are basic operations on Pandas Series:

  1. Creating a Pandas Series.
  2. Accessing elements of Series.
  3. Indexing and Selecting Data in Series.
  4. Conversion operation on Series.

1. Creating a Pandas Series: Pandas Series can be created by loading datasets from existing storage and storage cn be SQL Database, CSV file and Excel file. Pandas Series can be created from the lists, dictionary, and from a scalar value.

a. creating a series from array:


b. creating a series from lists:




2. Accessing elements of Series: There two ways to access elements:
a. accessing elements from Series with Position:


b. accessing elements from Series using Label(index):



3. Indexing and Selecting Data in Series: It means selecting particular data from a Series. Indexing is also known as Subset Selection.

a. indexing a Series using indexing operator []:


b. indexing s Series using .loc[]:


c. indexing a Series using .iloc[]:



4. Conversion operation on Series: In conversion operation, we perform various operations like changing datatype of series, changing a series to list etc. To perform these operations, there are various functions like .astype() , .tolist() etc.

a. to convert a datatype to series:




Panels: 
Panel is an important container for data which is 3-dimensional. It is basically used less. The names for the 3 axes are intended to give some semantic meaning to describe operations involving panel data and econometric analysis of panel data.
Syntax: 
pandas.Panel(data, items, major_axis, minor_axis, dtype, copy)
where,
data: Data takes various forms like ndarray, series, map, lists, dict, constants, and also another DataFrame.
 items: axis=0
major_axis: axis=1
minor_axis: axis=2
dtype: Data type of each column
 copy: Copy ata, default-False.
Below are basic operations on Pandas Series:

  1. Create Panel.
  2. Selecting the data from Panel. 





1. Create Panel:  A panel can be created in the following ways-
a. from ndarrays:

Empty Panel looks like: 



b. from dict to DataFrame:



2. Selecting the Data from Panel:  Select the data from the panel using-
a. items:

We have two items, and we retrieved item1. The result is a DataFrame with 4 rows and 2 columns, which are the Major_axis and Minor_axis dimensions respectively.

b. major_axis:



c. minor_axis:

-------------------------------------------------------------------------------------------------------------------------

Other important commands in Python


Other important commands in Python



1. arange() : [ PS: it is arange not arrange, even I got confused for the first time :p ]
arange returns a full array, which occupies memory, which might be overhead.



2. reshape():


3. Transpose frame: 


4.Transpose Transpose frame:


Example: 


1. re-indexing or changing the order of rows:


2. re-indexing or changing the order of columns:


3. re-indexing:


4. Using comparator operator:


5. Using comparator operator and gives result in True/False:


6. To display everything up to three.



7. For concatenation of two data use '+' operator.



Extracting and Scraping Weather Data: Weather Forecast

Extracting and Scraping Weather Data


We can extract data from local weather site. You can extract weather information from this page.

  • First, explore the page structure with Chrome DevTools.
You can start the developer tools by pressing ctrl+shift+i or View -> Developer -> Developer Tools.
You can use the outermost element that contains all the text that corresponds to the extended forecast information.

In this case, the outermost tag in div tag with id seven-day-forecast. Under this comes the following section


Now, each forecast item (like "Tonight", "Thursday" and " Thursday Night") is contained in a div tag with the class tombstone-container.





  • Download the page and parse it.
1.      Download the web page containing the forecast.
2.      Create the BeautifulSoup  to parse.
3.     Find the div with the seven-day-forecast and asign it to seven_day.
4.     Inside seven_day,  find each individual forecast item.
5.     Extract and print the forecast item. 


  • Extract information from the page.
1.     tonight  has all the information. There are four pieces of information in tonight:
              a) Name of the forecast- ex. Tonight.
              b) Description of weather- ex. property of image.
              c) Short description of weather- ex. Mostly clear.
              d) Temperature- ex. 49 Degree.
Lets us extract forecast item, short description and temperature.


We can extract "title"  attribute from 'img'  tag. We can simply treat BeautifulSoup object like a dictionary and pass in attribute we wan as key:






  • Extract all the information from the page.

     Now, we will extract all the information fro the page. In the previous extraction, we only extracted single information ie.Tonight.
    Here we have extracted only the weekdays:

    We will also extract short description , temperature and description as well:






  • Combine the above data into a Pandas DataFrame.

    Now, we will combine the above extracted data into DataFrame (tabular format):



    • Let's play with the above data.
    a) You can use Regular Expression and Series.str.extract method to pull out numeric temperature values. 


    b)You can also find the mean of the temperatures:


    c) Select row that happens at night:




    More example on how to extract, scrape and combine data into DataFrame will be updated on this soon. Meanwhile, you should have good understanding of the above concepts and keep practising.