2.DataFrame

Two dimensional,size-mutable, heterogeneous tabular data.

pd.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame.

index : Index or array-like.

columns : Index or array-like.

dtype : dtype, default None.

copy : bool, default False.

1)Dataframe from dictionary:

my_dic={"name":["priya","ram","sri","nivi"],"id":[1001,1002,1003,1004],"dept":["ece","eie","ece","eie"]}
df=pd.DataFrame(my_dic)

#converts dictionary to dataframe.

2)Dataframe from csv:

import pandas as pd

my_data=pd.read_csv(patients.csv)

my_dataFrame=pd.DataFrame(my_data)

printing first 10 records

print(my_dataFrame.head(10))

head() method returns the data in the top rows. If no value is mentioned as parameter, by default it prints top 5 rows.

head(10) return top 10 rows.

printing last 10 records

print(my_dataFrame.tail(10))

tail(10) returns bottom 10 records.

Locate a record:

loc method returns the records from dataframe.

The index must be list of values.

my_dataframe.loc[[0,1]]

using [] so that the result is dataframe.

Deleting row or column from dataframe:

drop() deletes row or column from dataframe.

Labels is either 'row label' or 'column label'.

axis=0 denotes row and axis=1 denotes column.

df.drop(labels=2,axis=0)

deletes row with label 2.

df.drop(labels="dept",axis=1)

deletes dept column.

BONUS-Data cleaning:

Handling missing values:

fillna - inplace=True , mean, median, mode.
- Fills the empty value with given value or mean or median or mode.
- Mean - average value.
- Median - middle value after sorting.
- Mode - most repeated value.
dropna - drops the rows with empty values.

Handling duplicate records:

duplicated - the result is true if that row is repeated,else false.
drop_duplicates - drops the row with duplicate values.

Learn python from Noob to Pro - Blog for Python tutorial

Sunday, May 23, 2021

Pandas data structure-Intro to DataFrame