Exploratory Data Analysis (EDA)
In the EDA_Solution.ipynb file, I explore and analyze a real estate transaction dataset to uncover insights, identify patterns, and prepare the data for building a predictive pricing model.
Click to see details
- Mounting Google Drive in Google Colab: Access files stored in Google Drive to enable seamless data loading.
- Importing Libraries and Loading the Dataset:
- Import the necessary Python libraries:
- NumPy for numerical computing.
- Pandas for data manipulation.
- Matplotlib for visualization.
- Seaborn for enhanced visualization.
- Load the
real_estate.csvfile into a DataFrame.
- Import the necessary Python libraries:
- Displaying Basic Dataset Information:
- Print the dataset using
.head()to view the first five rows. - Display the last five rows of the dataset.
- Check the dataset's dimensions using the
.shapeattribute.
- Print the dataset using
- Exploring Feature Data Types:
- Print the column data types using
.dtypes.
- Print the column data types using
- Plotting Feature Distributions:
- Use Seaborn's Pairplot to display distributions of numeric features.
- Plot a histogram grid using the same method.
- Displaying Formal Summary Statistics:
- Summarize numerical features with the
.describe()function. - Summarize non-numerical features using
.describe(include='object'). - Observe missing values in the dataset.
- Summarize numerical features with the
- Exploring Segmentations:
- Use segmentation to observe the relationship between categorical and numeric features:
- Plot a box plot of
sqftbyproperty_typeusing Seaborn. - Plot a box plot of
pricebyproperty_typeusing Seaborn.
- Plot a box plot of
- Use segmentation to observe the relationship between categorical and numeric features:
- Analyzing Correlations:
- Calculate correlations between numeric features using the
.corr(numeric_only=True)function.
- Calculate correlations between numeric features using the
- Visualizing Correlation Grids:
- Plot a heatmap of annotated correlations using Seaborn.
- Observing Minimum Lot Size:
- Use
.locto filterlot_sizefor properties of typeCondo. - Use
.locto filterlot_sizefor properties of typeBungalow.
- Use
This project creates a regression model to predict property transaction prices with a mean absolute error (MAE) of under $70,000, providing a data-driven alternative to traditional appraisal methods.