House Sales¶
The dataset contains house sale prices for King County, USA between May 2014 and May 2015. It is well suited to practice regression techniques.
Column descriptions¶
Name | Description |
---|---|
id | A unique identifier |
year | Year of sale |
month | Month of sale |
day | Day of sale |
zipcode | Zipcode |
latitude | Latitude |
longitude | Longitude |
sqft_lot | Lot area in square feet |
sqft_living | Interior living space in square feet |
sqft_above | Interior living space above ground in square feet |
sqft_basement | Interior living space below ground in square feet |
floors | Number of floors |
bedrooms | Number of bedrooms |
bathrooms | Number of bathrooms. Fractional values indicate that components (toilet/sink/shower/bathtub) are missing. |
waterfront | Whether the building overlooks a waterfront (0 = no, 1 = yes) |
view | Rating of the view (1 to 5, higher is better) |
condition | Rating of the condition of the house (1 to 5, higher is better) |
grade | Rating of building construction and design (1 to 13, higher is better) |
year_built | Year the house was built |
year_renovated | Year the house was last renovated. A value of 0 indicates that it was never renovated. |
sqft_lot_15nn | Lot area of the 15 nearest neighbors in square feet |
sqft_living_15nn | Interior living space of the 15 nearest neighbors in square feet |
price | Price the house sold for in USD |
Sample¶
id | year | month | day | zipcode | latitude | longitude | sqft_lot | sqft_living | sqft_above | sqft_basement | floors | bedrooms | bathrooms | waterfront | view | condition | grade | year_built | year_renovated | sqft_lot_15nn | sqft_living_15nn | price | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 2014 | 5 | 2 | 98001 | 47.3406 | -122.269 | 9397 | 2200 | 2200 | 0 | 2.0 | 4 | 2.50 | 0 | 1 | 3 | 8 | 1987 | 0 | 9176 | 2310 | 285000 |
1 | 1 | 2014 | 5 | 2 | 98003 | 47.3537 | -122.303 | 10834 | 2090 | 1360 | 730 | 1.0 | 3 | 2.50 | 0 | 1 | 4 | 8 | 1987 | 0 | 8595 | 1750 | 285000 |
2 | 2 | 2014 | 5 | 2 | 98006 | 47.5443 | -122.177 | 8119 | 2160 | 1080 | 1080 | 1.0 | 4 | 2.25 | 0 | 1 | 3 | 8 | 1966 | 0 | 9000 | 1850 | 440000 |
3 | 3 | 2014 | 5 | 2 | 98006 | 47.5746 | -122.135 | 8800 | 1450 | 1450 | 0 | 1.0 | 4 | 1.00 | 0 | 1 | 4 | 7 | 1954 | 0 | 8942 | 1260 | 435000 |
4 | 4 | 2014 | 5 | 2 | 98006 | 47.5725 | -122.133 | 10000 | 1920 | 1070 | 850 | 1.0 | 4 | 1.50 | 0 | 1 | 4 | 7 | 1954 | 0 | 10836 | 1450 | 430000 |
5 | 5 | 2014 | 5 | 2 | 98007 | 47.6022 | -122.134 | 6700 | 1570 | 1570 | 0 | 1.0 | 3 | 1.50 | 0 | 1 | 4 | 7 | 1956 | 0 | 7300 | 1570 | 419000 |
6 | 6 | 2014 | 5 | 2 | 98008 | 47.6188 | -122.114 | 8030 | 2000 | 1000 | 1000 | 1.0 | 3 | 2.25 | 0 | 1 | 4 | 8 | 1963 | 0 | 8250 | 2070 | 420000 |
7 | 7 | 2014 | 5 | 2 | 98011 | 47.7698 | -122.222 | 9655 | 2210 | 1460 | 750 | 1.0 | 5 | 2.50 | 0 | 1 | 3 | 8 | 1976 | 0 | 8633 | 2080 | 470000 |
8 | 8 | 2014 | 5 | 2 | 98011 | 47.7419 | -122.205 | 12261 | 2730 | 2730 | 0 | 2.0 | 4 | 2.50 | 0 | 1 | 3 | 9 | 1991 | 0 | 10872 | 2730 | 612500 |
9 | 9 | 2014 | 5 | 2 | 98014 | 47.6517 | -121.906 | 23103 | 1800 | 1800 | 0 | 1.0 | 3 | 1.75 | 0 | 1 | 3 | 7 | 1968 | 0 | 18163 | 1410 | 284000 |
Schema¶
{ 'id': Integer, 'year': Integer, 'month': Integer, 'day': Integer, 'zipcode': Integer, 'latitude': RealNumber, 'longitude': RealNumber, 'sqft_lot': Integer, 'sqft_living': Integer, 'sqft_above': Integer, 'sqft_basement': Integer, 'floors': RealNumber, 'bedrooms': Integer, 'bathrooms': RealNumber, 'waterfront': Integer, 'view': Integer, 'condition': Integer, 'grade': Integer, 'year_built': Integer, 'year_renovated': Integer, 'sqft_lot_15nn': Integer, 'sqft_living_15nn': Integer, 'price': Integer }
Statistics¶
metrics | id | year | month | day | zipcode | latitude | longitude | sqft_lot | sqft_living | sqft_above | sqft_basement | floors | bedrooms | bathrooms | waterfront | view | condition | grade | year_built | year_renovated | sqft_lot_15nn | sqft_living_15nn | price | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | maximum | 21612 | 2015 | 12 | 31 | 98199 | 47.7776 | -121.315 | 1651359 | 13540 | 9410 | 4820 | 3.5 | 33 | 8.0 | 1 | 5 | 5 | 13 | 2015 | 2015 | 871200 | 6210 | 7700000 |
1 | minimum | 0 | 2014 | 1 | 1 | 98001 | 47.1559 | -122.519 | 520 | 290 | 290 | 0 | 1.0 | 0 | 0.0 | 0 | 1 | 1 | 1 | 1900 | 0 | 651 | 399 | 75000 |
2 | mean | 10806.0 | 2014.3229537778188 | 6.574422801091935 | 15.68819691852126 | 98077.93980474715 | 47.560052519317075 | -122.21389640494147 | 15106.967565816869 | 2079.8997362698374 | 1788.3906907879516 | 291.5090454818859 | 1.4943089807060566 | 3.37084162309721 | 2.1147573219821405 | 0.007541757275713691 | 1.2343034284921113 | 3.4094295100171195 | 7.656873178179799 | 1971.0051357978994 | 84.40225790033776 | 12768.455651691113 | 1986.552491556008 | 540088.1417665294 |
3 | mode | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | [2014] | [5] | [23] | [98103] | [47.5322, 47.5491, 47.6624, 47.6846] | [-122.29] | [5000] | [1300] | [1300] | [0] | [1.0] | [3] | [2.5] | [0] | [1] | [3] | [7] | [2014] | [0] | [5000] | [1540] | [350000, 450000] |
4 | median | 10806.0 | 2014.0 | 6.0 | 16.0 | 98065.0 | 47.5718 | -122.23 | 7618.0 | 1910.0 | 1560.0 | 0.0 | 1.5 | 3.0 | 2.25 | 0.0 | 1.0 | 3.0 | 7.0 | 1975.0 | 0.0 | 7620.0 | 1840.0 | 450000.0 |
5 | sum | 233550078 | 43535562 | 142093 | 339069 | 2119758513 | 1027915.4151 | -2641408.943 | 326506890 | 44952873 | 38652488 | 6300385 | 32296.5 | 72854 | 45706.25 | 163 | 26677 | 73688 | 165488 | 42599334 | 1824186 | 275964632 | 42935359 | 11672925008 |
6 | variance | 38928615.166666664 | 0.21866475249047013 | 9.705142556193023 | 74.56430497103068 | 2862.787834812884 | 0.019199901796007925 | 0.01983262201789115 | 1715658774.1754704 | 843533.6813681518 | 685734.6672685076 | 195872.66840096278 | 0.2915880068770518 | 0.8650150097573506 | 0.5931512887356004 | 0.007485225502686407 | 0.5872426169774175 | 0.4234665123939714 | 1.381703289347649 | 862.7972621657613 | 161346.2118623827 | 745518225.3404007 | 469761.23994532344 | 134782378397.24687 |
7 | standard deviation | 6239.280019895457 | 0.467616031045205 | 3.1153077787263688 | 8.635062534286053 | 53.505026257473084 | 0.13856371024192418 | 0.14082834238139405 | 41420.51151513548 | 918.4408970468115 | 828.090977651917 | 442.57504267746816 | 0.5399888951423462 | 0.9300618311474514 | 0.770163157217742 | 0.08651719772788764 | 0.7663175692736123 | 0.650743046366207 | 1.175458756974335 | 29.37341080238659 | 401.6792400191759 | 27304.17963133851 | 685.3913042527776 | 367127.19648269983 |
8 | idness | 1.0 | 9.253689908851155e-05 | 0.0005552213945310692 | 0.001434321935871929 | 0.003238791468097904 | 0.23291537500578355 | 0.03479387405728034 | 0.45259797344190994 | 0.04802665062693749 | 0.04376995326886596 | 0.014158145560542266 | 0.0002776106972655346 | 0.000601489844075325 | 0.0013880534863276732 | 9.253689908851155e-05 | 0.00023134224772127887 | 0.00023134224772127887 | 0.0005552213945310692 | 0.005367140147133669 | 0.003238791468097904 | 0.4020265580900384 | 0.03595058529588673 | 0.18636931476426224 |
9 | stability | 4.6268449544255775e-05 | 0.6770462221810947 | 0.11169203719983344 | 0.04191921528709573 | 0.027853606625641975 | 0.0007865636422523481 | 0.005367140147133669 | 0.016564104936843568 | 0.0063850460371072965 | 0.009808911303382224 | 0.6073196687179012 | 0.4941470411326516 | 0.4545412483227687 | 0.24892425854809605 | 0.9924582427242863 | 0.9017258131680007 | 0.6491926155554527 | 0.41553694535696106 | 0.025864063295238975 | 0.9577106371165502 | 0.019756627955397215 | 0.009114884560218387 | 0.007958173321611993 |
Correlation heatmap¶
Attribution¶
This dataset is a modified version of the "House Sales in King County, USA" dataset by Kaggle user harlfoxem
. The original dataset is licensed under CC0: Public Domain
.
Column descriptions are based on this Kaggle discussion.