拨开荷叶行,寻梦已然成。仙女莲花里,翩翩白鹭情。
IMG-LOGO
主页 文章列表 在R中检测具有多个观察值的行

在R中检测具有多个观察值的行

白鹭 - 2022-03-02 2156 0 0

我有一个这样的资料集。我想识别在“颜色”列中具有多个值的所有观察结果,并将它们替换为“多色”

ID  color1   color2
23   red      NA
44   blue     purple
51   yellow   NA
59   green    orange

像这样:

ID  color   
23   red      
44   multicolor     
51   yellow     
59   multicolor   

任何想法将不胜感激,谢谢!

uj5u.com热心网友回复:

这是在 tidyverse 中执行此操作的一种方法。

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(cols = starts_with("color"), values_to = "color", values_drop_na  = TRUE) %>% 
  group_by(ID) %>% 
  summarize(n = n(),
            color = toString(color), .groups = "drop") %>% 
  mutate(color = if_else(n > 1, "multicolor", color)) %>% 
  select(-n)

# # A tibble: 4 x 2
#      ID color     
#   <int> <chr>     
# 1    23 red       
# 2    44 multicolor
# 3    51 yellow    
# 4    59 multicolor

我是故意这样做的。请注意,如果您停在该summarize()行之后,您将获得实际颜色。

# # A tibble: 4 x 3
#      ID     n color        
#   <int> <int> <chr>        
# 1    23     1 red          
# 2    44     2 blue, purple 
# 3    51     1 yellow       
# 4    59     2 green, orange

如果您有许多颜色列,而不仅仅是 2 个,这将可以缩放。使用它,有很多方法可以调整这样的东西。


资料

df <- read.table(textConnection("ID  color1   color2
23   red      NA
44   blue     purple
51   yellow   NA
59   green    orange"), header = TRUE)

uj5u.com热心网友回复:

你可以这样做,假设data是你的资料集。

library(dplyr)

data <- data.frame(ID = c(23, 44, 51, 59),
                   color1 = c("red", "blue", "yellow", "green"),
                   color2 = c(NA, "purple", NA, "orange"))

data %>% 
  mutate(color = ifelse(is.na(color2), color1, "multicolor")) %>% 
  select(ID, color)

uj5u.com热心网友回复:

这是一个看似简单的解决方案:

library(dplyr)
library(stringr)
data %>%
  mutate(
    # step 1 - paste `color1` and `color2` together and remove " NA":
    color = gsub("\\sNA", "", paste(color1, color2)),
    # step 2 - count the number of white space characters:
    color = str_count(color, " "),
    # step 3 - label `color` as "multicolor" where `color` != 0:
    color = ifelse(color == 0, color1, "multicolor")) %>%
  # remove the obsolete color columns: 
  select(-matches("\\d$"))
  ID      color
1 23        red
2 44 multicolor
3 51     yellow
4 59 multicolor

资料:

data <- data.frame(ID = c(23, 44, 51, 59),
                   color1 = c("red", "blue", "yellow", "green"),
                   color2 = c(NA, "purple", NA, "orange"))

uj5u.com热心网友回复:

基础R的方法

# get colors from columns named color*
colo <- paste(names(table(unlist(df1[,grep("color",colnames(df1))]))), collapse="|")

colo
[1] "blue|green|red|yellow|orange|purple"

# match the colors and do the conversion
data.frame( 
  ID=df1$ID, 
  color=apply( df1, 1, function(x){ 
    y=x[grep(colo, x)];
    if(length(y)>1){y="multicolor"}; y } ) )
  ID      color
1 23        red
2 44 multicolor
3 51     yellow
4 59 multicolor

资料

df1 <- structure(list(ID = c(23L, 44L, 51L, 59L), color1 = c("red", 
"blue", "yellow", "green"), color2 = c(NA, "purple", NA, "orange"
)), class = "data.frame", row.names = c(NA, -4L))
标签:

0 评论

发表评论

您的电子邮件地址不会被公开。 必填的字段已做标记 *