2.DataFrame
Two dimensional,size-mutable, heterogeneous tabular data.
pd.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
- data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame.
- index : Index or array-like.
- columns : Index or array-like.
- dtype : dtype, default None.
- copy : bool, default False.
1)Dataframe from dictionary:
my_dic={"name":["priya","ram","sri","nivi"],"id":[1001,1002,1003,1004],"dept":["ece","eie","ece","eie"]}
df=pd.DataFrame(my_dic)
#converts dictionary to dataframe.
2)Dataframe from csv:
import pandas as pd
my_data=pd.read_csv(patients.csv)
my_dataFrame=pd.DataFrame(my_data)
printing first 10 records
print(my_dataFrame.head(10))
head() method returns the data in the top rows. If no value is mentioned as parameter, by default it prints top 5 rows.
head(10) return top 10 rows.
printing last 10 records
print(my_dataFrame.tail(10))
tail(10) returns bottom 10 records.
Locate a record:
loc method returns the records from dataframe.
The index must be list of values.
my_dataframe.loc[[0,1]]
using [] so that the result is dataframe.
Deleting row or column from dataframe:
drop() deletes row or column from dataframe.
Labels is either 'row label' or 'column label'.
axis=0 denotes row and axis=1 denotes column.df.drop(labels="dept",axis=1)
BONUS-Data cleaning:
Handling missing values:
fillna - inplace=True , mean, median, mode.
- Fills the empty value with given value or mean or median or mode.
- Mean - average value.
- Median - middle value after sorting.
- Mode - most repeated value.
dropna - drops the rows with empty values.
Handling duplicate records:
- duplicated - the result is true if that row is repeated,else false.
- drop_duplicates - drops the row with duplicate values.
No comments:
Post a Comment