Thursday, 01 May 2014
  11 Replies
  62 Visits
0
Votes
Undo
  Subscribe
Post your process of filtration of the email data and queries related to that here.
11 years ago
·
#411
0
Votes
Undo
We need to stick to Unscheduled & Original data, leaving out the Scheduled and Reply data as of now. So we need to omit Scheduled and Reply data from the email data set. The main operator is this regard is 'Filter Documents (By Token)" which is to be used inside the process document operator after reading the data via "Read Excel operator".
Now we will use Filter Document (by content) operator 4 times in order to remove Scheduled and Reply data. The strings used by me are: RE: | Maintenance | Scheduled | Routines. I am not taking the "planned" string as of now in filter documents operator for analyzing.
Any one got any other opinion regarding this?
11 years ago
·
#414
0
Votes
Undo
In case of feeding email body and email subject both to the read excel operator, how you guys are processing? Are you filtering based on subject only and then adding body to the filtered subjects to be used as training set with category?
But while filtering the scheduled and replies through various keywords used as an input to Filter Documents by Content operator, some unscheduled and original emails are being filtered because there is mention of the keyword "scheduled" in some of the email bodies. As a result, I am missing some data points. Any suggestion?
11 years ago
·
#416
0
Votes
Undo
That is the problem. I have not used planned. instead i have used routine
11 years ago
·
#418
0
Votes
Undo
Hi Tapasree,

Your sentences confused me. :P

So here's the thing that you follow:

1. Select Read Excel operator. While selecting the checkboxes, select subject and body both. Click Finish.
2. Select Process Documents from Data operator.
3. Under Process Documents from Data operator, bring in Filter Documents by content each time. In each case, enter the following in the string field: re:, scheduled, planned, maintenance. Be sure you have not selected case sensitive and you have selected inverse condition.



Let me know if you face problems. :)
11 years ago
·
#419
0
Votes
Undo
Hi Adipta,

Thanks for your help!
But the problem is still existing. Let me try to clarify my problem..

I have used the exactly same procedure you mentioned but while filtering both subject and body together, some of the emails consist of the word "scheduled" in the body while it is actually an unscheduled email if we refer the master file.

Hope I could make myself clear now.
11 years ago
·
#424
0
Votes
Undo
Hi,

I cannot get the phrase 'filtering both subject and body together'. Filtering is done only on one attribute. Since, from the emails re has to be removed, subject attribute is to be considered. That you can do using the set role operator where subject is to be taken as id. So, if a text in subject contains re, scheduled, planned or maintenance, along with that text, the corresponding body is also filtered.

And to see whether an email is scheduled or unscheduled, you do not have to refer to the master file. The master file is only to help you to make you categorize the emails.

Can you please share screenshots?

Get in touch if you face problems.

Thanks! :)
11 years ago
·
#425
0
Votes
Undo
yes there are some issues with obtained output and the master file outputs match, category definitions etc. Both filtered test and filtered training data needs to be verified once before we proceed further to say "No stones untouched". will chalk that out tomorrow together. just chill for time being. :)
11 years ago
·
#426
0
Votes
Undo
Ok. :)
11 years ago
·
#463
0
Votes
Undo
:) B)
  • Page :
  • 1
There are no replies made for this post yet.
Submit Your Response
Upload files or images for this discussion by clicking on the upload button below.
Supported: gif,jpg,png,jpeg,zip,rar,pdf
· Insert · Remove
  Upload Files (Maximum 2MB)

Sharing your current location while posting a new question allow viewers to identify the location you are located.