python - Delete h2 until you reach the next h2 in beautifulsoup -


considering following html:

<h2 id="example">cool stuff</h2> <ul> <li>hi</li> </ul> <div> <h2 id="cool"><h2> <ul><li>zz</li> </ul> </div> 

and following list:

ignore_list = ['example','lalala'] 

my goal is, while going through html using beautifulsoup, find h2 has id in list (ignore_list) should delete ul , lis under until find h2. check if next h2 in ignore list, if is, delete ul , lis until reach next h2 (or if there no h2s left, delete ul , lis under current 1 , stop).

how see process going: read h2s down in dom. if id of in ignore_list, delete ul , li under h2 until reach next h2. if there no h2, delete ul , li stop.

here full hmtl trying work with: http://pastebin.com/z3ev9c8n

i trying delete ul , lis after "see_also" how accomplish in python?

below solution came with.

remove content don't want

        try:             element in body.find_all('h2'):                 current_h2 = element.get_text()                 current_h2 = current_h2.replace('[edit]','')                 #print(current_h2)                 if(current_h2 in ignore_list):                     if(element.find_next_sibling('div') != none):                         element.find_next_sibling('div').decompose()                     if(element.find_next_sibling('ul') != none):                         element.find_next_sibling('ul').decompose()         except(attributeerror, typeerror) e:             continue     

Comments

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -