Identify and count spells (Distinctive events within each group) The Next CEO of Stack...
Is there a rule of thumb for determining the amount one should accept for a settlement offer?
Is it a bad idea to plug the other end of ESD strap to wall ground?
MT "will strike" & LXX "will watch carefully" (Gen 3:15)?
Creating a script with console commands
Gödel's incompleteness theorems - what are the religious implications?
Calculating discount not working
Is it correct to say moon starry nights?
Upgrading From a 9 Speed Sora Derailleur?
Arity of Primitive Recursive Functions
How to pronounce fünf in 45
Can I cast Thunderwave and be at the center of its bottom face, but not be affected by it?
How should I connect my cat5 cable to connectors having an orange-green line?
How to find if SQL server backup is encrypted with TDE without restoring the backup
How can I replace x-axis labels with pre-determined symbols?
Man transported from Alternate World into ours by a Neutrino Detector
Is a distribution that is normal, but highly skewed, considered Gaussian?
That's an odd coin - I wonder why
How did scripture get the name bible?
Why did the Drakh emissary look so blurred in S04:E11 "Lines of Communication"?
My boss doesn't want me to have a side project
Small nick on power cord from an electric alarm clock, and copper wiring exposed but intact
Is it okay to majorly distort historical facts while writing a fiction story?
How badly should I try to prevent a user from XSSing themselves?
Why doesn't Shulchan Aruch include the laws of destroying fruit trees?
Identify and count spells (Distinctive events within each group)
The Next CEO of Stack OverflowR - list to data frameCount number of rows within each groupCounting unique / distinct values by group in a data frameR: find relative weight within each group and within the entire dataframeR: how to calculate summary for each group and all the data?count the number of distinct variables in a groupusing tidyverse; counting after and before change in value, within groups, generating new variables for each unique shiftDistinct in r within groups of datahow to get count and distinct count with group by in dataframe RNest a dataframe by group, but include extra rows within each groupChange value by group based in reference within group
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell
is what I'm trying to compute. I've tried using dplyr
's lead
and lag
, but that gets too complicated. I've tried rle
but got nowhere.
ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse
solution.
Assumptions
Data is sorted by
group
and then bytime
There are no gaps in
time
within each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rle
approach by @markus took 0.53 seconds - the
cumsum
approach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rle
anddense_rank
by @tmfmnk took 0.89
r dataframe dplyr time-series tidyverse
add a comment |
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell
is what I'm trying to compute. I've tried using dplyr
's lead
and lag
, but that gets too complicated. I've tried rle
but got nowhere.
ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse
solution.
Assumptions
Data is sorted by
group
and then bytime
There are no gaps in
time
within each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rle
approach by @markus took 0.53 seconds - the
cumsum
approach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rle
anddense_rank
by @tmfmnk took 0.89
r dataframe dplyr time-series tidyverse
2
For someone who is not familiar with how thespell
is computed, can you share a formula or description?
– nsinghs
7 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
7 hours ago
add a comment |
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell
is what I'm trying to compute. I've tried using dplyr
's lead
and lag
, but that gets too complicated. I've tried rle
but got nowhere.
ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse
solution.
Assumptions
Data is sorted by
group
and then bytime
There are no gaps in
time
within each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rle
approach by @markus took 0.53 seconds - the
cumsum
approach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rle
anddense_rank
by @tmfmnk took 0.89
r dataframe dplyr time-series tidyverse
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell
is what I'm trying to compute. I've tried using dplyr
's lead
and lag
, but that gets too complicated. I've tried rle
but got nowhere.
ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse
solution.
Assumptions
Data is sorted by
group
and then bytime
There are no gaps in
time
within each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rle
approach by @markus took 0.53 seconds - the
cumsum
approach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rle
anddense_rank
by @tmfmnk took 0.89
r dataframe dplyr time-series tidyverse
r dataframe dplyr time-series tidyverse
edited 2 hours ago
Thomas Speidel
asked 7 hours ago
Thomas SpeidelThomas Speidel
359216
359216
2
For someone who is not familiar with how thespell
is computed, can you share a formula or description?
– nsinghs
7 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
7 hours ago
add a comment |
2
For someone who is not familiar with how thespell
is computed, can you share a formula or description?
– nsinghs
7 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
7 hours ago
2
2
For someone who is not familiar with how the
spell
is computed, can you share a formula or description?– nsinghs
7 hours ago
For someone who is not familiar with how the
spell
is computed, can you share a formula or description?– nsinghs
7 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
7 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
7 hours ago
add a comment |
6 Answers
6
active
oldest
votes
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
add a comment |
A somehow different possibility could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
add a comment |
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
add a comment |
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
add a comment |
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55463310%2fidentify-and-count-spells-distinctive-events-within-each-group%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
add a comment |
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
add a comment |
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
edited 5 hours ago
answered 7 hours ago
markusmarkus
15k11336
15k11336
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
add a comment |
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
1
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
answered 7 hours ago
MrFlickMrFlick
124k11141173
124k11141173
add a comment |
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
edited 7 hours ago
answered 7 hours ago
Hector HaffendenHector Haffenden
579216
579216
add a comment |
add a comment |
A somehow different possibility could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
add a comment |
A somehow different possibility could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
add a comment |
A somehow different possibility could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
A somehow different possibility could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
answered 6 hours ago
tmfmnktmfmnk
3,6211516
3,6211516
add a comment |
add a comment |
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
add a comment |
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
add a comment |
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
answered 5 hours ago
M-MM-M
7,17962146
7,17962146
add a comment |
add a comment |
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
add a comment |
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
add a comment |
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
edited 1 hour ago
answered 2 hours ago
akrunakrun
418k13207282
418k13207282
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55463310%2fidentify-and-count-spells-distinctive-events-within-each-group%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
For someone who is not familiar with how the
spell
is computed, can you share a formula or description?– nsinghs
7 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
7 hours ago