Skip to content

Conversation

@shivanshuk
Copy link
Contributor

Added some part from Milestone 1:

  • Reading the data set
  • Creating a master regex as per the data set
  • removing unwanted data from the dataset
  • replacing the unwanted data as per regex
  • cleansing the data set
  • total 8500 * 4 shape gathered and converted
  • replaced all junky characters, emails, body text, cid images etc from data set
  • saved processed data v1.0 under the datasets folder.

@shivanshuk shivanshuk marked this pull request as ready for review May 3, 2020 19:18
@shivanshuk
Copy link
Contributor Author

@rgnanas
@SrikanthEnuguru
Please check and review the same

@shivanshuk shivanshuk changed the title Added pre procession version 1.0 to clean up data set and produce a n… Added pre processed version 1.0 to clean up data set and produce a n… May 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants