Predicting Delayed Flights

Predicting Delayed Flights. The file contains information

on all commercial flights departing the Washington, DC area and arriving at New

York during January 2004. For each flight there is information on the departure and

arrival airports, the distance of the route, the scheduled time and date of the flight, and

so on. The variable that we are trying to predict is whether or not a flight is delayed.

A delay is defined as an arrival that is at least 15 minutes later than scheduled.

Data preprocessing. Bin the scheduled departure time (CRS.DEP TIME) into 8

hins. This will avoid treating the departure time as a continuous predictor, because

it is reasonable that delays are related to rush-hour times. (Note that these data are

not stored in JMP with a time format, so you’ll need to explore the best way to bin

this data – two options are (1) via the formula editor and (2) using the Make Binning

Formula column utility.) Partition the data into training and validation sets.

a. Fit a classification tree to the flight delay variable using all the relevant predictors

(use the binned version of the departure time) and the validation column. Do not

include DEP TIME (actoal departure time) in the model because it is unknown at

the time of prediction (uuless we are doing our predicting of delays after the plane

takes off, wbich is uulikely).

i. How many splits are in the final model?

ii. How many variables are involved in the splits?

iii. Which variables contribute the most to the model?

iv. Which variables were not involved in any of the splits?

v. Express the resulting tree as a set of rules.

vi. If you needed to fly between DCA and EWR on a Monday at 7 AM, would

you be able to use this tree to predict whether the flight will be delayed? What

other information would you need? Is this information available in practice?

What information is redundant?

b. Fit another tree, this time using the original scheduled departure time rather than

the binned version. Save the formula for this model to the data table (we’ll retoru

to this in a futore exercise).

i. Compare this tree to the original, in terms of the number of splits and the

number of variables involved. What are the key differences?

find the cost of your paper

Suppose you know that the marketer is a B2C marketer. What is the probability that he or she plans to increase of social media?

Business Focus B2C B2B Total A survey of B2B marketers (marketers that focus primarily on attracting businesses) and B2C marketers (marketers that primarily target consumers) was based on 1000 B2B….

For a major in financial management graduate, there are a lot of opportunities waiting for them in the real world

For a major in financial management graduate, there are a lot of opportunities waiting for them in the real world. A major in finance individual developed an analytical skill is….

Popa Ltd trade in a perishable commodity.

Popa Ltd trade in a perishable commodity. Each day Popa Ltd. receives supplies of the goods from a wholesaler but the quantity supplied is a random variable, as is the….