# Guideline to Transfer Large Movie Review Dataset - aclImdb to MindRecord
<!-- TOC -->
- [What does the example do](#what-does-the-example-do)
- [How to use the example to generate MindRecord](#how-to-use-the-example-to-generate-mindrecord)
- [Download aclImdb dataset and unzip](#download-aclimdb-dataset-and-unzip)
- [Generate MindRecord](#generate-mindrecord)
- [Create MindDataset By MindRecord](#create-minddataset-by-mindrecord)
<!-- /TOC -->
## What does the example do
This example is used to read data from aclImdb dataset and generate mindrecord. It justtransfers the aclImdb dataset to mindrecord without any data preprocessing. You canmodify the example or follow the
example to implement your own example.
1.run.sh: generate MindRecord entry script.
- gen_mindrecord.py: read the aclImdb data and transfer it to mindrecord.
2.run_read.py: create MindDataset by MindRecord entry script.
- create_dataset.py: use MindDataset to read MindRecord to generate dataset.
## How to use the example to generate MindRecord
Download aclImdb dataset, transfer it to mindrecord, use MindDataset to read mindrecord.
### Download aclImdb dataset and unzip
1.Download the training data zip.
> [aclImdb datasetdownload address](http://ai.stanford.edu/~amaas/data/sentiment/) **-> Large Movie Review Dataset v1.0**
2. Unzip the training data to dir example/nlp_to_mindrecord/aclImdb/data.
```
tar -zxvf aclImdb_v1.tar.gz -C {your-mindspore}/example/nlp_to_mindrecord/aclImdb/data/
```
###Generate MindRecord
1.Run the run.sh script.
```bash
bash run.sh
```
2. Output like this:
```
...
>> begin generate mindrecord by train data
...
[INFO] ME(20928,python):2020-05-07-23:02:40.066.546 [mindspore/ccsrc/mindrecord/io/shard_writer.cc:667] WriteRawData] Write 256 records successfully.
>> transformed 24320 record...
[INFO] ME(20928,python):2020-05-07-23:02:40.078.344 [mindspore/ccsrc/mindrecord/io/shard_writer.cc:667] WriteRawData] Write 256 records successfully.
>> transformed 24576 record...
[INFO] ME(20928,python):2020-05-07-23:02:40.090.237 [mindspore/ccsrc/mindrecord/io/shard_writer.cc:667] WriteRawData] Write 256 records successfully.
[INFO] ME(20928,python):2020-05-07-23:02:40.099.302 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:45] Build] Init header from mindrecord file for index successfully.
[INFO] ME(20928,python):2020-05-07-23:02:40.122.271 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:586] DatabaseWriter] Init index db for shard: 0 successfully.
[INFO] ME(20928,python):2020-05-07-23:02:40.932.360 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:535] ExecuteTransaction] Insert 24596 rows to index db.
[INFO] ME(20928,python):2020-05-07-23:02:40.953.177 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:535] ExecuteTransa ction] Insert 404 rows to index db.
[INFO] ME(20928,python):2020-05-07-23:02:40.963.400 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:606] DatabaseWriter] Generate index db for shard: 0 successfully.
[INFO] ME(20928:139630558652224,MainProcess):2020-05-07-23:02:40.964.973 [mindspore/mindrecord/filewriter.py:313] Thelist of mindrecord files created are: ['output/aclImdb_train.mindrecord'], and the
list of index files are: ['output/aclImdb_train.mindrecord.db']
>> begin generate mindrecord by test data
...
>> transformed 24576 record...
[INFO] ME(20928,python):2020-05-07-23:02:42.120.007 [mindspore/ccsrc/mindrecord/io/shard_writer.cc:667] WriteRawData]Write 256 records successfully.
>> transformed 24832 record...
[INFO] ME(20928,python):2020-05-07-23:02:42.128.862 [mindspore/ccsrc/mindrecord/io/shard_writer.cc:667] WriteRawData] Write 168 records successfully.
[INFO] ME(20928,python):2020-05-07-23:02:42.151.237 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:586] DatabaseWriter] Init index db for shard: 0 successfully.
[INFO] ME(20928,python):2020-05-07-23:02:42.935.496 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:535] ExecuteTransaction] Insert 25000 rows to index db.
[INFO] ME(20928,python):2020-05-07-23:02:42.949.319 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:606] DatabaseWriter] Generate index db for shard: 0 successfully.
[INFO] ME(20928:139630558652224,MainProcess):2020-05-07-23:02:42.951.794 [mindspore/mindrecord/filewriter.py:313] The list of mindrecord files created are: ['output/aclImdb_test.mindrecord'], and the
list of index files are: ['output/aclImdb_test.mindrecord.db']
-id : the id "3219" isfromreview docs like **3219**_10.txt.
-label: indicates whetherthe review is positive or negative, positive: 0, negative:1.
-score: the score "10" isfromreview docs like 3219_**10**.txt.
- review : the review is from the review dos's content.
新值
# Guideline to Transfer Caltech-UCSD Birds-200-2011 Dataset to MindRecord
<!-- TOC -->
- [What does the example do](#what-does-the-example-do)
- [How to use the example to generate MindRecord](#how-to-use-the-example-to-generate-mindrecord)
- [Download Caltech-UCSD Birds-200-2011 dataset and unzip](#download-caltech-ucsd-birds-200-2011-dataset-and-unzip)
- [Generate MindRecord](#generate-mindrecord)
- [Create MindDataset By MindRecord](#create-minddataset-by-mindrecord)
<!-- /TOC -->
## What does the example do
This example is used to read data from Caltech-UCSD Birds-200-2011 dataset and generatemindrecord. It just transfers the Caltech-UCSD Birds-200-2011 dataset to mindrecordwithout any data preprocessing. You can modify the example or follow the example to implement your own example.
1. run.sh: generate MindRecord entry script.
- gen_mindrecord.py : read the Caltech-UCSD Birds-200-2011 data and transfer it to mindrecord.
2.run_read.py: createMindDataset by MindRecord entry script.
- create_dataset.py: use MindDataset to read MindRecord to generate dataset.
## How to use the example to generate MindRecord
Download Caltech-UCSD Birds-200-2011 dataset, transfer it to mindrecord, use MindDataset to read mindrecord.
### Download Caltech-UCSD Birds-200-2011 dataset and unzip
[INFO] MD(11253,python):2020-05-20-16:22:21.211.688 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:59] Build] Init header from mindrecord file for index successfully.
[INFO] MD(11253,python):2020-05-20-16:22:21.236.799 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:600] DatabaseWriter] Init index db for shard: 0 successfully.
[INFO] MD(11253,python):2020-05-20-16:22:21.964.034 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:549] ExecuteTransaction] Insert 11788 rows to index db.
[INFO] MD(11253,python):2020-05-20-16:22:21.978.087 [mindspore/ccsrc/mindrecord/io/shard_index_generator.cc:620] DatabaseWriter] Generate index db for shard: 0 successfully.
[INFO] ME(11253:139923799271232,MainProcess):2020-05-20-16:22:21.979.634 [mindspore/mindrecord/filewriter.py:313] The list of mindrecord files created are:['output/CUB_200_2011.mindrecord'], and the list of index files are: ['output/CUB_200_2011.mindrecord.db']
[INFO] MD(12469,python):2020-05-20-16:26:38.308.797 [mindspore/ccsrc/dataset/util/task.cc:31] operator()] Op launched, OperatorId:0 Thread ID 139702598620928 Started.
[INFO] MD(12469,python):2020-05-20-16:26:38.322.433 [mindspore/ccsrc/mindrecord/io/shard_reader.cc:343] ReadAllRowsInShard] Get 11788 records from shard 0 index.
[INFO] MD(12469,python):2020-05-20-16:26:38.386.904 [mindspore/ccsrc/mindrecord/io/shard_reader.cc:1058] CreateTasks] Total rows is 11788
[INFO] MD(12469,python):2020-05-20-16:26:38.387.068 [mindspore/ccsrc/dataset/util/task.cc:31] operator()] Parallel Op Worker Thread ID 139702590228224 Started.
[INFO] MD(12469,python):2020-05-20-16:26:38.387.272 [mindspore/ccsrc/dataset/util/task.cc:31] operator()] Parallel Op Worker Thread ID 139702581044992 Started.
[INFO] MD(12469,python):2020-05-20-16:26:38.387.465 [mindspore/ccsrc/dataset/util/task.cc:31] operator()] Parallel Op Worker Thread ID 139702572652288 Started.
[INFO] MD(12469,python):2020-05-20-16:26:38.387.617 [mindspore/ccsrc/dataset/util/task.cc:31] operator()] Parallel Op Worker Thread ID 139702564259584 Started.