# A term for "a series with lost data"

< Previous | Next >

#### vgiv

##### Member
Let us suppose that you had a series of data (e.g. a set of daily measurements of temperature) and have randomly removed data points from it (a pretty large part of data, e.g., 95%). How can you call the resulting series? "A decimated series". " a narrowed series", "a winnowed series", or something else?

I'm sure that there is a special term for it, but cannot find it.

• #### suzi br

##### Senior Member
The key word would probabely be RANDOM. e.g. randomly selected.

#### entangledbank

##### Senior Member
I don't know of a term for this. A sequence or matrix that only has a few non-empty or non-zero terms is called sparse.

#### vgiv

##### Member
The translator of my paper to English has wroten "a winnowed series", but I rather doubt.

entangledbank, "a sparse series" is a bit different term. I want to stress that a part of my data is lost, not zeroed.

suzi br, yes, one can write, say, "a randomly selected subseries", but it is too long.

#### entangledbank

##### Senior Member
It appears there is a winnowing algorithm in machine learning, but it doesn't fit what you want. And 'decimate' is used to mean reducing the sampling rate, which is similar but again not exactly what you want.

#### Edinburgher

##### Senior Member
I suppose you could call the sequence sparsely-sampled or perhaps compacted. Presumably the purpose of the compaction is to avoid storing unnecessary data, while at the same time taking care you don't lose too much significant information about the continuous function you are sampling. For example, if you are measuring the heights of ocean tides at a coastal location, for the purpose of carrying out a harmonic analysis, you really don't need to sample it every second, or every minute. Every ten minutes will probably suffice.
How What can you call the resulting series sequence?
Strictly-speaking a series is a sum. Not in everyday language, but in the language of mathematics.

#### entangledbank

##### Senior Member
Yes, I've been googling for things like "decimate" "sequence", because "sequence" gives much more relevant results than "series".

#### vgiv

##### Member
I suppose you could call the sequence sparsely-sampled or perhaps compacted. Presumably the purpose of the compaction is to avoid storing unnecessary data, while at the same time taking care you don't lose too much significant information about the continuous function you are sampling. For example, if you are measuring the heights of ocean tides at a coastal location, for the purpose of carrying out a harmonic analysis, you really don't need to sample it every second, or every minute. Every ten minutes will probably suffice.

Alas, it's not my case, I think. I'm speaking about chaotically corrupted data rather than intentionally compressed or compacted ones.

Strictly-speaking a series is a sum. Not in everyday language, but in the language of mathematics.
Sums? I've never heard about it. I don't know exactly about mathematics, but in astronomy the term "time series" are widely used and it means just "a sequence of measurements" (e.g. [1601.03536] Time series analysis of long-term photometry of BM Canum Venaticorum)

#### JulianStuart

##### Senior Member
If I had removed the data points from the set, I would refer to it as a "reduced data set". If it has been done to make the set smaller, by some defined protocol, it could be a "compressed data set". The context isn't very clear (your title uses "lost" data which implies accidental) - there are many ways to "compress" sets of numbers, some lossy, some lossless. (Data compression - Wikipedia, the free encyclopedia)

If you have (effectively) randomly selected 5% of the data to retain, then I have no idea what that might be called!

Last edited:

#### vgiv

##### Member
your title uses "lost" data which implies accidental
Yes, it is just the case! Imagine that a scientist measured an air temperature once a day. But a good deal of his records was lost or corrupted, so now only ocasional values of temperature (say, one or two a month) is available. There was a whole time series (once a day) and now one has only randomly selected subset of it. In Russian one can call the resulting series "прореженный (ряд)". Google Translate gives for it "decimated", "thinned" or "sparse (series)", but I'm not sure in any of these terms.

Last edited:

#### JulianStuart

##### Senior Member
I was confused by your use of "randomly selected" - I thought the selection was an action performed by the researcher(!)
I've not encountered this situation very often so know of no "official" name for it. I'd refer to a "residual" data set or the "residue" or "remains" of the original data set As an adjective, one might contemplate "vestigial"
(from the dictionary under vestige:
a mark, trace, or visible evidence of something that is no longer present or in existence:the last vestiges of a once great empire.dataset

#### entangledbank

##### Senior Member
I can think of many things such a scientist would call that data, but none would be printable in a scientific journal.

#### vgiv

##### Member
I was confused by your use of "randomly selected" - I thought the selection was an action performed by the researcher(!)
(from the dictionary under vestige:
From the viewpoint of data analysis there is no difference between the case with a researcher who rolls a dice to select some subset of data and any other stochastic process

I've not encountered this situation very often so know of no "official" name for it. I'd refer to a "residual" data set or the "residue" or "remains" of the original data set As an adjective, one might contemplate "vestigial"
(from the dictionary under vestige:
Yes, "vestigial" sounds great And "residual series" seems to be good too. Thanks a lot.

#### Edinburgher

##### Senior Member
Perhaps "trace data" might also serve. We use "trace" sometimes for when there is only a tiny bit of something, often when the quantity is too small to measure.

#### JulianStuart

##### Senior Member
From the viewpoint of data analysis there is no difference between the case with a researcher who rolls a dice to select some subset of data and any other stochastic process
The resulting datasets may behave the same way, but the dictionary entry for select sounds like it's the wrong word for the situation Preference seems like an antonym for random!
select
• to choose in preference;
pick:Only the best students were selected for admission.

#### Myridon

##### Senior Member
From the viewpoint of data analysis there is no difference between the case with a researcher who rolls a dice to select some subset of data and any other stochastic process
Throwing out the results that don't support your hypothesis is exactly the same as randomly selecting data?

#### vgiv

##### Member
Myridon, what hypothesis? I've said that data points were removed randomly.

#### PaulQ

##### Senior Member
It strikes me that what is left is not a series, it is simply "the remaining/surviving data".

#### Edinburgher

##### Senior Member
Throwing out the results that don't support your hypothesis is exactly the same as randomly selecting data?
Myridon, what hypothesis? I've said that data points were removed randomly.
vgiv: Myridon was only joking, but you should not use the word "removed" here, because it carries an implication that data were deliberately selected for removal.

Myridon: See vgiv's reply #8 which, owing to finger-trouble, appeared in the box quoting me instead of below it, making it look as though it was part of what I had written (and was therefore uninteresting). The missing data were apparently obscured/corrupted by some natural interfering phenomenon.

#### vgiv

##### Member
Thanks to everybody! I cannot say that everything is crystal clear now, but I got some ideas.

< Previous | Next >