Following are the methods available for ItemLoader objects
Sr. No | Method | Description |
---|---|---|
1 | get_value(value,*processors,**kwargs) |
The value is processed by the mentioned processor, and, keyword arguments. The keyword argument parameter can be : ‘re’, A regular expression to use, for getting data, from the given value, applied before the processor. |
2 | add_value(fieldname,*processors, **kwargs) | Process, and, then add the given value, for the field given. Here, value is first passed, through the get_value(), by giving the processor and kwargs. It is then passed, through the field input processor. The result is appended, to the data collected, for that field. If field, already contains data, then, new data is added. The field name can have None value as well. Here, multiple values can be added, in the form of dictionary objects. |
3 | replace_value(fieldname, *processors, **kwargs) | This method, replaces the collected value with a new value, instead of adding it. |
4 | get_xpath( XPath,*processors, **kwargs) |
This method receives an XPath expression. This expression is used to get a list of Unicode strings, from the selector, which is related, to the ItemLoader. This method, is similar to ItemLoader.get_value(). The parameters, of this method, are – XPath – the XPath expression to extract data from the webpage re – A regular expression string, or, a pattern to get data from the XPath region. |
5 | add_xpath(xpath,*processors, **kwargs) |
This method, receives an XPath expression, that is used to select, a list of strings, from the selector, related to the ItemLoader. It is similar to ItemLoader.add_value(). Parameter is – XPath – The XPath expression to extract data from. |
6 | replace_xpath(fieldname, XPath,*processors,**kwargs) | Instead of, adding the extracted data, this method, replaces the collected data. |
7 | get_css(CSS, *processors, **kwargs) |
This method receives a CSS selector, and, not a value, which is then used to get a list of Unicode strings, from the selector, associated with the ItemLoader. The parameters can be – CSS – The string selector to get data from re – A regular expression string or a pattern to get data from the CSS region. |
8 | add_css(fieldname, css, *processors, **kwargs) |
This method, adds a CSS selector, to the field. It is similar to add_value(), but, receives a CSS selector. Parameter is – CSS – A string CSS selector to extract data from |
9 | replace_css(fieldname, CSS, *processors, **kwargs) | Instead of, adding collected data, this method replaces it, using the CSS selector. |
10 | load_item() | This method is used to populate, the item received so far, and return it. The data is first passed through, the output_processors, so that the final value, is assigned to each field. |
11 | nested_css(css, **context) | Using CSS selector, this method is used to create nested selectors. The CSS supplied, is applied relative, to the selector, associated with the ItemLoader. |
12 | nested_xpath(xpath) | Using the XPath selector, create a nested loader. The XPath supplied, is applied relative, to the selector associated with the ItemLoader. |
Scrapy – Item Loaders
In this article, we are going to discuss Item Loaders in Scrapy.
Scrapy is used for extracting data, using spiders, that crawl through the website. The obtained data can also be processed, in the form, of Scrapy Items. The Item Loaders play a significant role, in parsing the data, before populating the Item fields. In this article, we will learn about Item Loaders.