Oct 10, 2018

Example 11: Titanic Survival Predictions

Machine Learning from Titanic Disaster


This example shows a complete analysis of the survival prediction in the infamous Titanic sinking incident. The analysis is done using Jupyter Notebook.

One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class passengers.

In this analysis it is tried to understand what sorts of people were likely to survive. We apply the tools of machine learning to predict which passengers survived the tragedy.

The datasets with training and testing data of the passenger details are downloaded from the Kaggle site. The training dataset contain the survival information along with the passenger details, whereas the testing dataset doesn't  contain the survival information, which we have to predict.
Using Pandas and Numpy libraries to analyse Titanic Survival Predictions using training and testing csv datasets.
In [1]:
%matplotlib inline
import pandas as pd
import numpy as np
import re as re
train_data = pd.read_csv('./MLData/01_Titanic_Survivals/train.csv')
test_data = pd.read_csv('./MLData/01_Titanic_Survivals/test.csv')
full_data = [train_data, test_data]
print(train_data.info(), '\n')
print(test_data.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB
None 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 418 entries, 0 to 417
Data columns (total 11 columns):
PassengerId    418 non-null int64
Pclass         418 non-null int64
Name           418 non-null object
Sex            418 non-null object
Age            332 non-null float64
SibSp          418 non-null int64
Parch          418 non-null int64
Ticket         418 non-null object
Fare           417 non-null float64
Cabin          91 non-null object
Embarked       418 non-null object
dtypes: float64(2), int64(4), object(5)
memory usage: 36.0+ KB
None
In [2]:
train_data.head(10)
Out[2]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
5 6 0 3 Moran, Mr. James male NaN 0 0 330877 8.4583 NaN Q
6 7 0 1 McCarthy, Mr. Timothy J male 54.0 0 0 17463 51.8625 E46 S
7 8 0 3 Palsson, Master. Gosta Leonard male 2.0 3 1 349909 21.0750 NaN S
8 9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 2 347742 11.1333 NaN S
9 10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 0 237736 30.0708 NaN C
In [3]:
test_data.head(3)
Out[3]:
PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 892 3 Kelly, Mr. James male 34.5 0 0 330911 7.8292 NaN Q
1 893 3 Wilkes, Mrs. James (Ellen Needs) female 47.0 1 0 363272 7.0000 NaN S
2 894 2 Myles, Mr. Thomas Francis male 62.0 0 0 240276 9.6875 NaN Q
In [4]:
train_data.describe()
Out[4]:
PassengerId Survived Pclass Age SibSp Parch Fare
count 891.000000 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000
mean 446.000000 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208
std 257.353842 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429
min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000
25% 223.500000 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400
50% 446.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200
75% 668.500000 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000
max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200
In [5]:
test_data.describe()
Out[5]:
PassengerId Pclass Age SibSp Parch Fare
count 418.000000 418.000000 332.000000 418.000000 418.000000 417.000000
mean 1100.500000 2.265550 30.272590 0.447368 0.392344 35.627188
std 120.810458 0.841838 14.181209 0.896760 0.981429 55.907576
min 892.000000 1.000000 0.170000 0.000000 0.000000 0.000000
25% 996.250000 1.000000 21.000000 0.000000 0.000000 7.895800
50% 1100.500000 3.000000 27.000000 0.000000 0.000000 14.454200
75% 1204.750000 3.000000 39.000000 1.000000 0.000000 31.500000
max 1309.000000 3.000000 76.000000 8.000000 9.000000 512.329200
In [6]:
train_data.isnull().sum()
Out[6]:
PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64
In [7]:
test_data.isnull().sum()
Out[7]:
PassengerId      0
Pclass           0
Name             0
Sex              0
Age             86
SibSp            0
Parch            0
Ticket           0
Fare             1
Cabin          327
Embarked         0
dtype: int64
Checking the relation between passenger class (Pclass) and the survived passengers (Survived) in the training dataset.
In [8]:
print(train_data[['Pclass', 'Survived']].groupby(['Pclass'], as_index=False).mean())
   Pclass  Survived
0       1  0.629630
1       2  0.472826
2       3  0.242363
In [9]:
print(train_data[['Sex', 'Survived']].groupby(['Sex'], as_index=False).mean())
      Sex  Survived
0  female  0.742038
1    male  0.188908
Column SibSp means Number of Siblings/Spouses Aboard
Column Parch means Number of Parents/Children Aboard
With the number of Siblings/Spouse and the number of Children/Parents we can create new feature called Family Size.

Selecting data by row numbers (.iloc)
Selecting data by label or by a conditional statment (.loc)
In [10]:
for dataset in full_data:
    dataset['FamilySize'] = dataset['SibSp'] + dataset['Parch'] + 1
print(train_data[['FamilySize', 'Survived']].groupby(['FamilySize'], as_index=False).mean())
   FamilySize  Survived
0           1  0.303538
1           2  0.552795
2           3  0.578431
3           4  0.724138
4           5  0.200000
5           6  0.136364
6           7  0.333333
7           8  0.000000
8          11  0.000000
If a passenger has at least FamilySize = 1, then he/she is not alone
Else the passenger is travelling alone.
In [11]:
for dataset in full_data:
    dataset['IsNotAlone'] = 0
    dataset.loc[dataset['FamilySize'] == 1, 'IsNotAlone'] = 1
print(train_data[['IsNotAlone', 'Survived']].groupby(['IsNotAlone'], as_index=False).mean())
   IsNotAlone  Survived
0           0  0.505650
1           1  0.303538
The column embarked means Port of Embarkation
Embarkation is the process of loading passengers to a ship or an airplane.
Values are: C = Cherbourg, Q = Queenstown, S = Southampton

(The embarked column has some missing value. So we try to fill those with the most occurred value ( 'S' ), calculated with the collowing code.)
In [12]:
train_data['Embarked'].value_counts()
Out[12]:
S    644
C    168
Q     77
Name: Embarked, dtype: int64
In [13]:
for dataset in full_data:
    dataset['Embarked'] = dataset['Embarked'].fillna('S')
print(train_data[['Embarked', 'Survived']].groupby(['Embarked'], as_index=False).mean())
  Embarked  Survived
0        C  0.553571
1        Q  0.389610
2        S  0.339009
Can Fare play a role in survival?

Fare also has some missing value and we will replace it with the median value. And we categorize it into 4 ranges.

Here we use qcut function from pandas library. Qcut is a quantile-based discretization function. It discretize variable into equal-sized buckets based on rank or based on sample quantiles.
In [14]:
for dataset in full_data:
    dataset['Fare'] = dataset['Fare'].fillna(train_data['Fare'].median())
train_data['CategoricalFare'] = pd.qcut(train_data['Fare'], 4)
print (train_data[['CategoricalFare', 'Survived']].groupby(['CategoricalFare'], as_index=False).mean())
   CategoricalFare  Survived
0   (-0.001, 7.91]  0.197309
1   (7.91, 14.454]  0.303571
2   (14.454, 31.0]  0.454955
3  (31.0, 512.329]  0.581081
Age could play a role in survival. Children and elderly ones may get preference in a life boat and so may get survived.

But we have plenty of missing values in this feature. We generate random numbers between (mean - std. dev) and (mean + std.dev). Then we categorize it into 5 range.

Use cut function from the pandas library when you need to segment and sort data values into bins. This function is also useful for going from a continuous variable to a categorical variable.
In [15]:
for dataset in full_data:
    age_avg = dataset['Age'].mean()
    age_std = dataset['Age'].std()
    age_null_count = dataset['Age'].isnull().sum()
    
    pd.set_option('mode.chained_assignment', None)
    age_null_random_list = np.random.randint(age_avg - age_std, age_avg + age_std, size = age_null_count)
    dataset['Age'][np.isnan(dataset['Age'])] = age_null_random_list
    dataset['Age'] = dataset['Age'].astype(int)

train_data['CategoricalAge'] = pd.cut(train_data['Age'], 5)

print(train_data[['CategoricalAge', 'Survived']].groupby(['CategoricalAge'], as_index=False).mean())
  CategoricalAge  Survived
0  (-0.08, 16.0]  0.535088
1   (16.0, 32.0]  0.346847
2   (32.0, 48.0]  0.379447
3   (48.0, 64.0]  0.434783
4   (64.0, 80.0]  0.090909
In [16]:
def get_title(name):
    title_search = re.search(' ([A-Za-z]+)\.', name)
    if title_search:
        return title_search.group(1)
    return ""

for dataset in full_data:
    dataset['Title'] = dataset['Name'].apply(get_title)

print(pd.crosstab(train_data['Title'], train_data['Sex']))
Sex       female  male
Title                 
Capt           0     1
Col            0     2
Countess       1     0
Don            0     1
Dr             1     6
Jonkheer       0     1
Lady           1     0
Major          0     2
Master         0    40
Miss         182     0
Mlle           2     0
Mme            1     0
Mr             0   517
Mrs          125     0
Ms             1     0
Rev            0     6
Sir            0     1
Passengers title could also be a parameter which can decide on survival.
Inside the feature Name, we can find the title of people.

Python's built-in re module provides support for Regular Expressions in regexes or regex pattern.
In [17]:
for dataset in full_data:
    dataset['Title'] = dataset['Title'].replace(['Lady', 'Countess','Capt', 'Col', 'Don', 'Dr',\
                                                 'Major', 'Rev', 'Sir', 'Jonkheer', 'Dona'], 'Rare')
    dataset['Title'] = dataset['Title'].replace('Mlle', 'Miss')
    dataset['Title'] = dataset['Title'].replace('Ms', 'Miss')
    dataset['Title'] = dataset['Title'].replace('Mme', 'Mrs')

print(train_data[['Title', 'Survived']].groupby(['Title'], as_index=False).mean())
    Title  Survived
0  Master  0.575000
1    Miss  0.702703
2      Mr  0.156673
3     Mrs  0.793651
4    Rare  0.347826

Data Cleaning


Now let's clean our data and map our features into numerical values for applying Machine Learning.
In [18]:
print(full_data)
[     PassengerId  Survived  Pclass  \
0              1         0       3   
1              2         1       1   
2              3         1       3   
3              4         1       1   
4              5         0       3   
5              6         0       3    
..           ...       ...     ...    
885          886         0       3   
886          887         0       2   
887          888         1       1   
888          889         0       3   
889          890         1       1   
890          891         0       3   

                                                  Name     Sex  Age  SibSp  \
0                              Braund, Mr. Owen Harris    male   22      1   
1    Cumings, Mrs. John Bradley (Florence Briggs Th...  female   38      1   
2                               Heikkinen, Miss. Laina  female   26      0   
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female   35      1   
4                             Allen, Mr. William Henry    male   35      0   
..                                                 ...     ...  ...    ...   
885               Rice, Mrs. William (Margaret Norton)  female   39      0   
886                              Montvila, Rev. Juozas    male   27      0   
887                       Graham, Miss. Margaret Edith  female   19      0   
888           Johnston, Miss. Catherine Helen "Carrie"  female   15      1   
889                              Behr, Mr. Karl Howell    male   26      0   
890                                Dooley, Mr. Patrick    male   32      0   

     Parch            Ticket      Fare        Cabin Embarked  FamilySize  \
0        0         A/5 21171    7.2500          NaN        S           2   
1        0          PC 17599   71.2833          C85        C           2   
2        0  STON/O2. 3101282    7.9250          NaN        S           1   
3        0            113803   53.1000         C123        S           2   
4        0            373450    8.0500          NaN        S           1   
5        0            330877    8.4583          NaN        Q           1   
..     ...               ...       ...          ...      ...         ...      
885      5            382652   29.1250          NaN        Q           6   
886      0            211536   13.0000          NaN        S           1   
887      0            112053   30.0000          B42        S           1   
888      2        W./C. 6607   23.4500          NaN        S           4   
889      0            111369   30.0000         C148        C           1   
890      0            370376    7.7500          NaN        Q           1   

     IsNotAlone  CategoricalFare CategoricalAge   Title  
0             0   (-0.001, 7.91]   (16.0, 32.0]      Mr  
1             0  (31.0, 512.329]   (32.0, 48.0]     Mrs  
2             1   (7.91, 14.454]   (16.0, 32.0]    Miss  
3             0  (31.0, 512.329]   (32.0, 48.0]     Mrs  
4             1   (7.91, 14.454]   (32.0, 48.0]      Mr  
5             1   (7.91, 14.454]   (16.0, 32.0]      Mr  
..          ...              ...            ...     ...  
885           0   (14.454, 31.0]   (32.0, 48.0]     Mrs  
886           1   (7.91, 14.454]   (16.0, 32.0]    Rare  
887           1   (14.454, 31.0]   (16.0, 32.0]    Miss  
888           0   (14.454, 31.0]  (-0.08, 16.0]    Miss  
889           1   (14.454, 31.0]   (16.0, 32.0]      Mr  
890           1   (-0.001, 7.91]   (16.0, 32.0]      Mr  

[891 rows x 17 columns],      PassengerId  Pclass                                               Name  \
0            892       3                                   Kelly, Mr. James   
1            893       3                   Wilkes, Mrs. James (Ellen Needs)   
2            894       2                          Myles, Mr. Thomas Francis   
3            895       3                                   Wirz, Mr. Albert   
4            896       3       Hirvonen, Mrs. Alexander (Helga E Lindqvist)   
5            897       3                         Svensson, Mr. Johan Cervin   
..           ...     ...                                                ...    
412         1304       3                     Henriksson, Miss. Jenny Lovisa   
413         1305       3                                 Spector, Mr. Woolf   
414         1306       1                       Oliva y Ocana, Dona. Fermina   
415         1307       3                       Saether, Mr. Simon Sivertsen   
416         1308       3                                Ware, Mr. Frederick   
417         1309       3                           Peter, Master. Michael J   

        Sex  Age  SibSp  Parch              Ticket      Fare            Cabin  \
0      male   34      0      0              330911    7.8292              NaN   
1    female   47      1      0              363272    7.0000              NaN   
2      male   62      0      0              240276    9.6875              NaN   
3      male   27      0      0              315154    8.6625              NaN   
4    female   22      1      1             3101298   12.2875              NaN   
5      male   14      0      0                7538    9.2250              NaN   
..      ...  ...    ...    ...                 ...       ...              ...    
412  female   28      0      0              347086    7.7750              NaN   
413    male   21      0      0           A.5. 3236    8.0500              NaN   
414  female   39      0      0            PC 17758  108.9000             C105   
415    male   38      0      0  SOTON/O.Q. 3101262    7.2500              NaN   
416    male   33      0      0              359309    8.0500              NaN   
417    male   31      1      1                2668   22.3583              NaN   

    Embarked  FamilySize  IsNotAlone   Title  
0          Q           1           1      Mr  
1          S           2           0     Mrs  
2          Q           1           1      Mr  
3          S           1           1      Mr  
4          S           3           0     Mrs  
5          S           1           1      Mr  
..       ...         ...         ...     ...   
412        S           1           1    Miss  
413        S           1           1      Mr  
414        C           1           1    Rare  
415        S           1           1      Mr  
416        S           1           1      Mr  
417        C           3           0  Master  

[418 rows x 14 columns]]
In [19]:
for dataset in full_data:
    # Mapping Sex
    dataset['Sex'] = dataset['Sex'].map( {'female': 0, 'male': 1} ).astype(int)
    
    # Mapping Titles
    title_mapping = {"Mr": 1, "Miss": 2, "Mrs": 3, "Master": 4, "Rare": 5}
    dataset['Title'] = dataset['Title'].map(title_mapping)
    dataset['Title'] = dataset['Title'].fillna(0)
    
    # Mapping Embarked
    dataset['Embarked'] = dataset['Embarked'].map( {'S': 0, 'C': 1, 'Q': 2} ).astype(int)
    
    # Mapping Fare
    dataset.loc[ dataset['Fare'] <= 7.91, 'Fare'] = 0
    dataset.loc[(dataset['Fare'] > 7.91) & (dataset['Fare'] <= 14.454), 'Fare'] = 1
    dataset.loc[(dataset['Fare'] > 14.454) & (dataset['Fare'] <= 31), 'Fare'] = 2
    dataset.loc[ dataset['Fare'] > 31, 'Fare'] = 3
    dataset['Fare'] = dataset['Fare'].astype(int)
    
    # Mapping Age
    dataset.loc[ dataset['Age'] <= 16, 'Age'] = 0
    dataset.loc[(dataset['Age'] > 16) & (dataset['Age'] <= 32), 'Age'] = 1
    dataset.loc[(dataset['Age'] > 32) & (dataset['Age'] <= 48), 'Age'] = 2
    dataset.loc[(dataset['Age'] > 48) & (dataset['Age'] <= 64), 'Age'] = 3
    dataset.loc[ dataset['Age'] > 64, 'Age'] = 4

# Feature Selection
drop_elements = ['PassengerId', 'Name', 'Ticket', 'Cabin', 'SibSp', 'Parch', 'FamilySize']
train_data = train_data.drop(drop_elements, axis = 1)
train_data = train_data.drop(['CategoricalAge', 'CategoricalFare'], axis = 1)

test_data  = test_data.drop(drop_elements, axis = 1)

print (train_data.head(10))

train_data = train_data.values
test_data  = test_data.values
   Survived  Pclass  Sex  Age  Fare  Embarked  IsNotAlone  Title
0         0       3    1    1     0         0           0      1
1         1       1    0    2     3         1           0      3
2         1       3    0    1     1         0           1      2
3         1       1    0    2     3         0           0      3
4         0       3    1    2     1         0           1      1
5         0       3    1    1     1         2           1      1
6         0       1    1    3     3         0           1      1
7         0       3    1    0     2         0           0      4
8         1       3    0    1     1         0           0      3
9         1       2    0    0     2         1           0      3

Classifier Comparision


Compring all the classifiers in the sklearn package to see which model gives the best comparision result.
In [20]:
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.metrics import accuracy_score, log_loss
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from sklearn.linear_model import LogisticRegression

classifiers = [
    KNeighborsClassifier(3),
    SVC(gamma=2, C=1), # probability=True
    DecisionTreeClassifier(max_depth=7),
    RandomForestClassifier(max_depth=2, n_estimators=10, random_state=0),
    AdaBoostClassifier(),
    GradientBoostingClassifier(),
    GaussianNB(),
    LinearDiscriminantAnalysis(),
    QuadraticDiscriminantAnalysis(),
    LogisticRegression(random_state=42, solver='lbfgs')]

log_cols = ["Classifier", "Accuracy"]
log = pd.DataFrame(columns=log_cols)

sss = StratifiedShuffleSplit(n_splits=10, test_size=0.1, random_state=0)

X = train_data[0::, 1::]
y = train_data[0::, 0]

acc_dict = {}

for train_index, test_index in sss.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    for clf in classifiers:
        name = clf.__class__.__name__
        clf.fit(X_train, y_train)
        train_predictions = clf.predict(X_test)
        acc = accuracy_score(y_test, train_predictions)
        if name in acc_dict:
            acc_dict[name] += acc
        else:
            acc_dict[name] = acc

for clf in acc_dict:
    acc_dict[clf] = acc_dict[clf] / 10.0
    log_entry = pd.DataFrame([[clf, acc_dict[clf]]], columns=log_cols)
    log = log.append(log_entry)

plt.xlabel('Accuracy')
plt.title('Classifier Accuracy')

sns.set_color_codes("muted")
sns.barplot(x='Accuracy', y='Classifier', data=log, color="b")
Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x255717f4c88>

Prediction with the best Classifier


The best classifier appears to be GradientBoostingClassifier. We can use it to predict our test data.
In [21]:
candidate_classifier = GradientBoostingClassifier()
candidate_classifier.fit(train_data[0::, 1::], train_data[0::, 0])
result = candidate_classifier.predict(test_data)
print(result)
[0 1 0 0 1 0 1 0 1 0 0 0 1 0 1 1 0 0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 1 1 1 0 1 1 1 0 0 1
 1 0 0 1 0 1 1 0 0 0 0 0 1 0 1 1 1 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0
 1 1 1 1 0 0 0 0 1 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
 0 0 1 0 0 0 0 0 1 0 0 1 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 1 0 1 1 0 1
 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 1 1 0 1 0 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0
 1 0 1 1 0 1 0 0 1 1 0 0 1 0 0 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1
 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 1 0 0
 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0
 1 0 0 0 0 0 1 0 0 0 1 0 1 0 1 0 1 1 0 0 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 1 0
 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 1 0 0 1 0 1 0 0 1 0 1 0 1 0 0
 0 1 1 1 1 0 0 1 0 0 1]
In [ ]:
 


The Final Result:


The above prediction results can be tallied with the passengers list in the train dataset to observe the survival prediction as following, where 1 means survived and 0 means not survived.