Exploratory Data Analysis (EDA)

Click to open the file…

In the EDA_Solution.ipynb file, I explore and analyze a real estate transaction dataset to uncover insights, identify patterns, and prepare the data for building a predictive pricing model.

Click to see details
  • Mounting Google Drive in Google Colab: Access files stored in Google Drive to enable seamless data loading.
  • Importing Libraries and Loading the Dataset:
    • Import the necessary Python libraries:
      • NumPy for numerical computing.
      • Pandas for data manipulation.
      • Matplotlib for visualization.
      • Seaborn for enhanced visualization.
    • Load the real_estate.csv file into a DataFrame.
  • Displaying Basic Dataset Information:
    • Print the dataset using .head() to view the first five rows.
    • Display the last five rows of the dataset.
    • Check the dataset's dimensions using the .shape attribute.
  • Exploring Feature Data Types:
    • Print the column data types using .dtypes.
  • Plotting Feature Distributions:
    • Use Seaborn's Pairplot to display distributions of numeric features.
    • Plot a histogram grid using the same method.
  • Displaying Formal Summary Statistics:
    • Summarize numerical features with the .describe() function.
    • Summarize non-numerical features using .describe(include='object').
    • Observe missing values in the dataset.
  • Exploring Segmentations:
    • Use segmentation to observe the relationship between categorical and numeric features:
      • Plot a box plot of sqft by property_type using Seaborn.
      • Plot a box plot of price by property_type using Seaborn.
  • Analyzing Correlations:
    • Calculate correlations between numeric features using the .corr(numeric_only=True) function.
  • Visualizing Correlation Grids:
    • Plot a heatmap of annotated correlations using Seaborn.
  • Observing Minimum Lot Size:
    • Use .loc to filter lot_size for properties of type Condo.
    • Use .loc to filter lot_size for properties of type Bungalow.

This project creates a regression model to predict property transaction prices with a mean absolute error (MAE) of under $70,000, providing a data-driven alternative to traditional appraisal methods.