Titanic¶
The dataset contains information about the passengers of the Titanic and whether they survived the accident. It is well suited to practice classification techniques.
Column descriptions¶
Name | Description |
---|---|
id | A unique identifier |
name | Name of the passenger |
sex | Sex of the passenger |
age | Age of the passenger at the time of the accident |
siblings_spouses | Number of siblings or spouses aboard |
parents_children | Number of parents or children aboard |
ticket | Ticket number |
travel_class | Travel class (1 = first, 2 = second, 3 = third) |
fare | Fare |
cabin | Cabin number |
port_embarked | Port of embarkation |
survived | Whether the passenger survived the accident (0 = no, 1 = yes) |
Sample¶
id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Abbing, Mr. Anthony | male | 42.0 | 0 | 0 | C.A. 5547 | 3 | 7.5500 | NaN | Southampton | 0 |
1 | 1 | Abbott, Master. Eugene Joseph | male | 13.0 | 0 | 2 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
2 | 2 | Abbott, Mr. Rossmore Edward | male | 16.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
3 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | female | 35.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 1 |
4 | 4 | Abelseth, Miss. Karen Marie | female | 16.0 | 0 | 0 | 348125 | 3 | 7.6500 | NaN | Southampton | 1 |
5 | 5 | Abelseth, Mr. Olaus Jorgensen | male | 25.0 | 0 | 0 | 348122 | 3 | 7.6500 | F G63 | Southampton | 1 |
6 | 6 | Abelson, Mr. Samuel | male | 30.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 0 |
7 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | female | 28.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 1 |
8 | 8 | Abrahamsson, Mr. Abraham August Johannes | male | 20.0 | 0 | 0 | SOTON/O2 3101284 | 3 | 7.9250 | NaN | Southampton | 1 |
9 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | female | 18.0 | 0 | 0 | 2657 | 3 | 7.2292 | NaN | Cherbourg | 1 |
Schema¶
{ 'id': Integer, 'name': String, 'sex': String, 'age': RealNumber?, 'siblings_spouses': Integer, 'parents_children': Integer, 'ticket': String, 'travel_class': Integer, 'fare': RealNumber?, 'cabin': Anything?, 'port_embarked': Anything?, 'survived': Integer }
Statistics¶
metrics | id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | maximum | 1308 | - | - | 80.0 | 8 | 9 | - | 3 | 512.3292 | - | - | 1 |
1 | minimum | 0 | - | - | 0.1667 | 0 | 0 | - | 1 | 0.0 | - | - | 0 |
2 | mean | 654.0 | - | - | 29.8811345124283 | 0.4988540870893812 | 0.3850267379679144 | - | 2.294881588999236 | 33.29547928134557 | - | - | 0.3819709702062643 |
3 | mode | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | ['Connolly, Miss. Kate', 'Kelly, Mr. James'] | ['male'] | [24.0] | [0] | [0] | ['CA. 2343'] | [3] | [8.05] | ['C23 C25 C27'] | ['Southampton'] | [0] |
4 | median | 654.0 | - | - | 28.0 | 0.0 | 0.0 | - | 3.0 | 14.4542 | - | - | 0.0 |
5 | sum | 856086 | - | - | 31255.6667 | 653 | 504 | - | 3004 | 43550.4869 | - | - | 500 |
6 | variance | 142899.16666666666 | - | - | 207.7489735996977 | 1.0850522026992615 | 0.7491945902631278 | - | 0.7019691946837118 | 2678.959737892891 | - | - | 0.2362496291260457 |
7 | standard deviation | 378.0200611960517 | - | - | 14.4134996999236 | 1.041658390596102 | 0.8655602753495147 | - | 0.8378360189701275 | 51.75866823917411 | - | - | 0.4860551708664827 |
8 | idness | 1.0 | 0.998472116119175 | 0.0015278838808250573 | 0.0748663101604278 | 0.0053475935828877 | 0.006111535523300229 | 0.7097020626432391 | 0.002291825821237586 | 0.21466768525592056 | 0.14209320091673033 | 0.002291825821237586 | 0.0015278838808250573 |
9 | stability | 0.0007639419404125286 | 0.0015278838808250573 | 0.6440030557677616 | 0.044933078393881457 | 0.680672268907563 | 0.7654698242933538 | 0.008403361344537815 | 0.5416348357524828 | 0.045871559633027525 | 0.020338983050847456 | 0.6993114001530222 | 0.6180290297937356 |
Correlation heatmap¶
Attribution¶
The dataset is a modified version of the "Titanic" dataset by Frank E. Harrell Jr. and Thomas Cason:
The original Titanic dataset, describing the survival status of individual passengers on the Titanic. The titanic data does not contain information from the crew, but it does contain actual ages of half of the passengers. The principal source for data about Titanic passengers is the Encyclopedia Titanica. The datasets used here were begun by a variety of researchers. One of the original sources is Eaton & Haas (1994) Titanic: Triumph and Tragedy, Patrick Stephens Ltd, which includes a passenger list created by many researchers and edited by Michael A. Findlay.
Thomas Cason of UVa has greatly updated and improved the titanic data frame using the Encyclopedia Titanica and created the dataset here. Some duplicate passengers have been dropped, many errors corrected, many missing ages filled in, and new variables created.