Appearances {Lahman} | R Documentation |
Data on player appearances
data(Appearances)
A data frame with 99466 observations on the following 21 variables.
yearID
Year
teamID
Team; a factor
lgID
League; a factor with levels AA
AL
FL
NL
PL
UA
playerID
Player ID code
G_all
Total games played
GS
Games started
G_batting
Games in which player batted
G_defense
Games in which player appeared on defense
G_p
Games as pitcher
G_c
Games as catcher
G_1b
Games as firstbaseman
G_2b
Games as secondbaseman
G_3b
Games as thirdbaseman
G_ss
Games as shortstop
G_lf
Games as leftfielder
G_cf
Games as centerfielder
G_rf
Games as right fielder
G_of
Games as outfielder
G_dh
Games as designated hitter
G_ph
Games as pinch hitter
G_pr
Games as pinch runner
The Appearances table in the original version has some incorrect variable names.
In particular, the 5th column is career_year
.
Lahman, S. (2015) Lahman's Baseball Database, 1871-2014, 2015 version, http://baseball1.com/statistics/
data(Appearances) # some test cases # Henry Aaron spent the last two years of his career as DH in Milwaukee subset(Appearances, playerID == 'aaronha01') # Herb Washington, strictly a pinch runner for Oakland in 1974-5 subset(Appearances, playerID == 'washihe01') subset(Appearances, playerID == 'thomeji01') subset(Appearances, playerID == 'hairsje02') # Appearances for the 1984 Cleveland Indians subset(Appearances, teamID == "CLE" & yearID == 1984) if (require(reshape2) & require(plyr)) { # Appearances for Pete Rose during his career: prose <- subset(Appearances, playerID == "rosepe01") # What was Pete Rose's primary position each year # of his career? prose_melt <- melt(prose, id = c("yearID", "teamID"), measure = 9:17) # Split out the position from variable prose_melt <- cbind(prose_melt, colsplit(prose_melt$variable, "_", names = c("G", "pos"))) # Two grouping variables because of an in-season trade in 1984 primary_pos <- ddply(prose_melt, .(yearID, teamID), summarise, top_pos = pos[which.max(value)], games = max(value)) primary_pos # Most pitcher appearances each year since 1950 ddply(subset(Appearances, yearID >= 1950), .(yearID), summarise, maxPitcher = playerID[which.max(G_p)], maxAppear = max(G_p)) # Individuals who have played all 162 games since 1961 all162 <- ddply(subset(Appearances, yearID > 1960), .(yearID), summarise, allGamers = playerID[G_all == 162]) # Number of all-gamers by year table(all162$yearID) }