我有一个这样的资料集。我想识别在“颜色”列中具有多个值的所有观察结果,并将它们替换为“多色”
ID color1 color2
23 red NA
44 blue purple
51 yellow NA
59 green orange
像这样:
ID color
23 red
44 multicolor
51 yellow
59 multicolor
任何想法将不胜感激,谢谢!
uj5u.com热心网友回复:
这是在 tidyverse 中执行此操作的一种方法。
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with("color"), values_to = "color", values_drop_na = TRUE) %>%
group_by(ID) %>%
summarize(n = n(),
color = toString(color), .groups = "drop") %>%
mutate(color = if_else(n > 1, "multicolor", color)) %>%
select(-n)
# # A tibble: 4 x 2
# ID color
# <int> <chr>
# 1 23 red
# 2 44 multicolor
# 3 51 yellow
# 4 59 multicolor
我是故意这样做的。请注意,如果您停在该summarize()
行之后,您将获得实际颜色。
# # A tibble: 4 x 3
# ID n color
# <int> <int> <chr>
# 1 23 1 red
# 2 44 2 blue, purple
# 3 51 1 yellow
# 4 59 2 green, orange
如果您有许多颜色列,而不仅仅是 2 个,这将可以缩放。使用它,有很多方法可以调整这样的东西。
资料
df <- read.table(textConnection("ID color1 color2
23 red NA
44 blue purple
51 yellow NA
59 green orange"), header = TRUE)
uj5u.com热心网友回复:
你可以这样做,假设data
是你的资料集。
library(dplyr)
data <- data.frame(ID = c(23, 44, 51, 59),
color1 = c("red", "blue", "yellow", "green"),
color2 = c(NA, "purple", NA, "orange"))
data %>%
mutate(color = ifelse(is.na(color2), color1, "multicolor")) %>%
select(ID, color)
uj5u.com热心网友回复:
这是一个看似简单的解决方案:
library(dplyr)
library(stringr)
data %>%
mutate(
# step 1 - paste `color1` and `color2` together and remove " NA":
color = gsub("\\sNA", "", paste(color1, color2)),
# step 2 - count the number of white space characters:
color = str_count(color, " "),
# step 3 - label `color` as "multicolor" where `color` != 0:
color = ifelse(color == 0, color1, "multicolor")) %>%
# remove the obsolete color columns:
select(-matches("\\d$"))
ID color
1 23 red
2 44 multicolor
3 51 yellow
4 59 multicolor
资料:
data <- data.frame(ID = c(23, 44, 51, 59),
color1 = c("red", "blue", "yellow", "green"),
color2 = c(NA, "purple", NA, "orange"))
uj5u.com热心网友回复:
甲基础R的方法
# get colors from columns named color*
colo <- paste(names(table(unlist(df1[,grep("color",colnames(df1))]))), collapse="|")
colo
[1] "blue|green|red|yellow|orange|purple"
# match the colors and do the conversion
data.frame(
ID=df1$ID,
color=apply( df1, 1, function(x){
y=x[grep(colo, x)];
if(length(y)>1){y="multicolor"}; y } ) )
ID color
1 23 red
2 44 multicolor
3 51 yellow
4 59 multicolor
资料
df1 <- structure(list(ID = c(23L, 44L, 51L, 59L), color1 = c("red",
"blue", "yellow", "green"), color2 = c(NA, "purple", NA, "orange"
)), class = "data.frame", row.names = c(NA, -4L))
0 评论